Kaptain: high performance Kubernetes Cluster
Kaptain: the Number Crunching container solution
KAPTAIN is configured to enable containers to access high performance GPUs and networks. The solution integrates a block storage service to ensure the persistence of container data and also includes an object storage service, based on MinIO, called S3.
Engineered, not assembled
KAPTAIN includes a modern web UI, designed for both the administrator and the final user, with a detailed catalgoue of ready-to-use, open source applications. Additionally, it can be set up to ensure high availability (HA) of core Kuberenetes services.
KAPTAIN only integrates high-end worker nodes and supports native high performance GPUs and networks.
KAPTAIN is the ideal infrastructure for housing workloads with a scalable microservices architecture.
KAPTAIN is a “ready-to-use” solution in terms of computing components, storage and networking, which integrates a powerful web based UI for both the administrator and final user.
Software components that implement KAPTAIN and its Data Science add-on will be regularly renovated and enriched with the most innovative open source updates.
Discover the advantages
At the core of the architecture is its ControlPlane:
- orchestration and manages containers
- provides the external interface (API)
The standard Kubernetes configuration includes:
- a Master Node, containing the ControlPlane
- a set of Worker Nodes, for the execution of containerised user workloads.
The infrastructure’s servers are interconnected by:
- an internal network, which communicates between the ControlPlane and the Worker Nodes
- an external network, which accesses the services and applications being executed by the Worker Nodes
HA Kaptain’s Configurations
The Hyperconverged configuration has a total of 3 servers, each one acting as both Master and Worker Node within a cluster. The 3 servers additionally offer distributed block storage services, thus also acting as Storage Nodes. This configuration provides the minimum services necessary to guarantee that a Kubernetes cluster remains reliable.
The Converged Configuration has between 3 to 5 servers which act solely as Master Nodes, whilst the rest act simultaneously as Worker and Storage Nodes. This is the ideal configuration for those planning on the long-term growth and modernisation of their computing capacity and storage availability.
The Distributed Configuration has 3 to 5 servers functioning as dedicated Worker Nodes whilst the rest act as either Worker or Storage Nodes. This is the highest performance configuration we offer, since on top of providing dedicated servers for the ControlPlane, it enables the user to incorporate distributed cloud native block storage into the Storage Nodes, optimising net disk space. This is necessary for selecting the right number of Worker Nodes for the user’s workload requirements and their capacity for horizontal scaling.
Kaptain is a “ready-to-use” Kubernetes cluster, designed to guarantee ease-of-use and high performance: its standard configuration integrates GPU and RDMA network support, distributed block and object storage services for persistence of container data, and a modern web interface for making, managing and monitoring the various types of workloads that a Kubernetes cluster can accomodate. Kaptain is an ideal infrastructure for use of DevOps methodology.
Rancher server: Kaptain’s web UI
The main functionalities offered by the Cluster Explorer Rancher Server are:
Cluster Viewer: for accessing information on the state of a cluster and its nodes.
Workload Manager: for making, managing and monitoring all the types of workload that a Kubernetes cluster can accomodate: Deployment, CronJob, DaemonSet, Job and StatefulSet; all within the limits set by the user interface.
Service Discovery: for accessing the services associated with your current workload, other user workloads or the system itself, revealed by the relevant associated policy.
Storage Manager: for managing the life cycle of all Persistent Volumes owned by the user, even if they are still tied to active workloads; the user of the Cluster Explorer can thus manage their own Secret and ConfigMap.
Apps & Marketplace: for access to a vast catalogue of applications and services (configurable by the system administrator) designed to be set up and launched through self-provisioning.
Nvidia Operators enable high performance:
NVIDIA GPU OPERATOR automates the management of all NVIDIA software components necessary for GPU provisioning to containers.
NVIDIA NETWORK OPERATOR enables high performance Kubernetes networks, permitting the use of RDMA and GPUDirect for the workloads run by a cluster.
Longhorn: il block storage distribuito di Kaptain
- Longhorn is a light, reliable and powerful distributed block storage system for Kubernetes
- Longhorn creates a dedicated controller for every block storage volume and synchronously duplicates it on multiple Storage Nodes
- Longhorn stores backup data in external storage (NFS or S3)
- Longhorn includes upgrade procedures that guarantee constant access to persistent volumes
- Longhorn has a web-accessible management UI
MinIO: High performance object storage
- MinIO is a high performance distributed object storage system.
- MinIO was designed to offer only object storage services whilst being scalable, lightweight and performant.
- MinIO excels in the typical use cases of object storage such as secondary storage, disaster recovery and back-up storage.
- MinIO is a one-stop solution for data storage in Big Data Analytics and Machine Learning.
READY-TO-GO HIGH PERFORMANCE KUBERNETES
KAPTAIN is a ready-to-go high performance solution, designed for the development, testing and deployment of scalable Data Analytics, Machine Learning and Deep Learning applications.
KAPTAIN gives you web-access to various systems for distributed data processing and enables the end-user to create interactive and personalised workspaces according to their needs.
KAPTAIN exclusively integrates open source technology developed within the most relevant communities active in the Data Science field.
Kaptain’s architecture ensures an infrastructure is scalable and responsive to changes in client requirements over time.
Why choose this E4 solution?
A ready-to-use Kubernetes cluster equipped with a web based UI for both the administrator and the end-user
– equipped with a multi-user Interactive Computing Environment made for data science
– the various workspaces have been integrated for the data scientist to maximise their productivity
Validated at every level to verify the effective performance of each system.
Performance tests are carried out on all servers that make up the solution before they are released to the client. In addition to the usual firmware check, homogeneity check, sanity check and setup check, we use additional tools that verify whether performance levels correspond to those requested by the client. A few tests of note include HPL (High Performance Linpack) to test a machine’s computing power, measured in FLOPs; STREAM to test the memory’s bandwidth, measured in MB/s; and IOzone to test a disc’s access speed, measured in MB/s and IOPS.
Every single component is tested to reduce early failure and DoA (Dead on Arrival) rates.
Each component is burn-in tested for up to 120hrs, according to a protocol developed by E4, to ensure that our unique systems remain perfectly engineered and functioning, thus reducing both the Dead on Arrival and “early failure” rate following release. This significantly improves the overall reliability of E4 solutions.
Serviced, by systems engineers who work with the most complex infrastructures in Italy and Europe, and by a team of highly qualified data scientists.
E4 is amongst the few companies that currently provides high level services to both large academic and private organisations, as well as to international research centres of complexity and of national and international importance. We support these institutions in the design, configuration and commissioning of extremely sophisticated solutions for the processing of Big Data with our high performance solutions.
Data scientist services
– per activity
– pay-per-day packages
Functional training for the workspace E4DS-PLATFORM
Senior Data Scientist consultation
– online session (pay-per-hour)
– onsite session (pay-per-day)
Platform customisation E4DS-PLATFORM
Extra | New functionalities coming soon
THE DISTRIBUTED DATA PROCESSING SYSTEMS INCLUDED IN THE E4DS-PLATFORM
E4DS-PLATFORM guarantees the coexistence of different high performance workspaces within the same infrastructure for distributed data processing:
– Apache Spark is a multilingual engine for data engineering, data science and machine learning workloads that can scale to both single servers and multi-node clusters.
– Distributed Dask makes the most popular Python libraries such as NumPy, Pandas and Scikit-Learn easily scalable.
– Ray Project simplifies the parallelisation of code from a single machine: it transfers the original code from a single CPU to multi-core, multi-GPU or multi-node locations, with minimal modification.
Kaptain guarantees the coexistence of different data preparation workspaces within the same infrastructure, creating a clear path from raw data collection to refined dataset analysis.
INTERACTIVE COMPUTING SERVICES INCLUDED IN THE E4DS PLATFORM
ICE4DS interactive workspace for distributed data science
– ICE4DS is an Interactive Computing workspace capable of communicating with the distributed data processing systems included in the E4DS PLATFORM
– ICE4DS is based on Jupyter Notebook technology
– ICE4DS is configured for use in the broad fields of Data Analysis, Machine Learning and Deep Learning
IMPLEMENTABLE DATA SERVICES IN THE E4DS PLATFORM
Kaptain is able to accommodate the suite of services which make up a typical Data Science workflow:
– Workflow manager
– Data acquisition systems
– Database services SQL and noSQL
– Large-scale inference services
E4 CONTAINER PLATFORM