An open source big data platform for distributed storage and processing.
Multitenant ecosystem
- A set of interrelated subsystems: MapReduce, a SQL query engine, a job scheduler, and a key-value store for OLTP workloads.
- Support for large numbers of users, eliminating the need for multiple installations and improving hardware utilization.
Reliability and stability
- No single point of failure.
- Automated replication between servers.
- Updates with no loss of progress.
Scalability
- Up to 1 million CPU cores and thousands of GPUs.
- Exabytes of data on different media: HDD, SSD, NVME, RAM.
- 10 000+ nodes.
- Automated server scaling up and down.
Rich functionality
- An expansive MapReduce model.
- Distributed ACID transactions.
- A variety of SDKs and APIs.
- Secure isolation for compute resources and storage.
- A user-friendly and easy-to-use UI.
CHYT powered by ClickHouse®
- A well-known SQL dialect and familiar functionality.
- Fast analytic queries.
- Integration with popular BI solutions via JDBC and ODBC.
SPYT powered by Apache Spark
- A set of popular tools for writing ETL processes.
- Support for multiple isolated clusters of various sizes.
- Easy migration for existing solutions.
Usage scenarios
Batch processing
MapReduce and SPYT for processing structured and unstructured data: logs and financial transactions.
Ad hoc analytics
CHYT provides rapid analytical queries without exporting data to an outside OLAP system. External dashboards and BI tools can access data using JDBC/ODBC protocols.
OLTP
Low-latency transactional key-value store allows building interactive pipelines and services.
Machine learning
Managing GPU clusters to train models with billions of parameters.
Metadata storage
Transactional metadata storage and a reliable distributed coordination service.
ETL pipelines
Build data processing pipelines using familiar tools: Apache Spark, SQL, MapReduce.
Address
Lva Tolstogo 16 Moscow119021
Russia