
Medooza: the Parallel Computing infrastructure
The ideal HPC Cluster
Medooza is an HPC cluster designed specifically to meet a client’s particular requirements. It is characterised by a carefully engineered optimisation of hardware and software components, with custom or “standardised” configurations for the X86, GPU, storage and high-speed interconnects.
To manage cluster flexibility and redundancy, the solution uses the OpenStack set of software components, which provides common services for infrastructure management.
The Medooza solution includes integrated tools and standardized installation procedures to ensure the release of stable and optimized operating environments that allow fully autonomous operational management by the customer.
Engineered, not assembled
Medooza is based on a suite of hardware and software components suitably harmonised by our expert engineers, lending itself to the continuous integration of new modules and modern features.
To guarantee maximum reliability, Medooza is put through a long-lasting stress test during production.
Prior to delivery, we produce an output report with test and real-time performance data for our clients.
The MEDOOZA approach
Medooza offers an intuitive and easy-to-manage approach to HPC clusters typical of traditional infrastructures but with the flexibility and robustness of a modern system.
New modules and features are released on a regular basis to ensure state-of-the-art performance and functionality that seamlessly integrates with your existing infrastructure.
AGNOSTIC
For both hardware and software, Medooza supports the widest possible range of high-performance CPUs (ARM, x86_64), accelerators (GPU, FPGA), networks (IB, RoCE) and disks (NVMe, SSD) as well as the most popular operating systems (Red Hat Enterprise Linux, Rocky Linux)
MODULAR
The various development workspaces accessible to users are flexible and reconfigurable. Their management takes place in environment modules created by E4 according the client’s needs.
MULTI CLUSTER
Medooza is natively multicluster. The same infrastructure can be used to serve multiple clusters within the same local network.
FUTURE PROOF
It is possible to expand the HPC system by adding new nodes (scale-out), check the state of the hardware and recondition old clusters where appropriate.
Designed to be the best, always
Advanced Workload Computing Engineering processes require in-depth data analysis, complex visualizations, and sophisticated modeling and simulation. For them, we use high-performance components such as GPU and FPGA server accelerators, as well as networking and storage.
Based on twenty years of experience working with the academic and industrial sectors, E4 engineers configure and implement HPC clusters using a suite of components suitably harmonised by Medooza.
Discover the advantages
Solution Layout
The MEDOOZA 2.0 solution consists of two main modules:
- the computing infrastructure, consisting of high-performance components such as server accelerators (GPUs and FPGAs), networking and high-performance storage
- the operating platform (E4HPC-PLATFORM), which enables the services and functionalities of the HPC cluster, as well as its full and effective management.
INFRASTRUCTURE + PLATFORM
Medooza 2.0 – Single control node
Medooza includes all the services needed to run a cluster.
- Service resilience managed by OpenStack
- OpenStack handles bare-metal deployment of compute nodes
- Storage network internal to P2P Control Nodes (no Switch)
- It is possible to activate add-on components of ICE4HPC-PLATFORM, such as Talos, CubeView, ICE4HPC, and others that are sold separately
- Ideal for HPC batch that does not require bare-metal reconfigurations over time
STANDARD APPLICATIONS | BATCH WORKFLOWS
Medooza 2.0 – Control nodes in HA
- Configuration with at least 3 control nodes
- CEPH storage network on dedicated switches
- OpenStack handles as-a-service delivery of infrastructure via pre-configured images
- It is possible to activate add-on components of ICE4HPC-PLATFORM, such as Talos, CubeView, ICE4HPC, and others that are sold separately
- Ideal for HPC customers who need the flexibility to adapt the infrastructure to changes
MISSION CRITICAL APPLICATIONS | INTERACTIVE WORKFLOWS
Technical features
INFRASTRUCTURAL FEATURES
MANAGED
HPC resource manager & scheduler configured to obtain maximum efficiency from the cluster and have full control over workload execution.
MONITORED
Centralized monitoring and alerting thanks to a single and intuitive control point that allows you to check the status of the cluster and receive alerts in case of anomalies.
STACK SOFTWARE
The software stack includes the main opensource tools for developing and running high-performance applications: compilers, scientific and numerical libraries, MPI, OpenMP.
ADDITIONAL COMPONENTS
Medooza can be enriched with a series of additional features according to the specific needs of each project:
- Optimized compilers and libraries
- Critical services delivered in HA
- Parallel and/or ephemeral file system
GUIDELINES
Since the architecture is extremely flexible and adaptable to any environment, here are some guidelines to keep in mind when designing a Medooza solution.
E4HPC-PLATFORM | CONTROL NODE VERSIONS
BASE | ADVANCED [include BASE] | EXTENDED [include ADVANCED] |
SLURM | HPC resource manager & scheduler, for maximum efficiency from the cluster and full control over the execution of workloads | Stack software, which includes major opensource tools: compilers, scientific and numerical libraries, MPI and OpenMP | HPC Resource Manager & Scheduler configured with resource usage accounting for operational reporting of cluster resource utilization. |
IPA | Identity manager, for user management and access to resources | Ephemeral file system, for managing Scratch’s I/O-intensive workloads (BeeOnd) | Extended software stack that includes optimized libraries from hardware vendors, Python computing environments, and popular computing applications in addition to the Advanced version components. |
NFS | Shared Filesystem NFS, for sharing data within the cluster | | |
ZABBIX | Centralized Web UI that enables the collection, manipulation, analysis, and visualization of data related to the operation of the entire computing infrastructure. |
Architectural advantages
OPEN SOURCE
The E4 HPC cluster is built using Open Source technology, validated by both our R&D laboratories and by our years of experiences with clients from both academic and corporate spheres.
BUILDING BLOCKS
Medooza’s design is based on hardware building blocks validated by our engineers, and configured and managed by internal tools developed in opensource.
SCALABLE
Medooza’s scaleout architecture allows it to respond transparently and natively to growing needs.
EFFECTIVE FLEXIBILITY
With a simple but appropriate reconfiguration of the environment, it is possible to add a variety of new nodes whilst leaving the systm architecture unchanged.
Why choose the E4 solution
UNIQUE
E4 engineers always design a solution by first understanding the client’s specific requirements and expectations.
Here is a summary of the principles with uphold throughout the design process:
• LISTENING: understand the client’s needs
• EXPLORATION: Collaborate with domain-specific ISVs / partners to optimize every single aspect of the design
• UNRESTRICTED: research and select the best infrastructure and implementation processes for the solution
• VERSATILE: provide of on-site and cloud-based solutions for production, testing and evaluation
VALIDATED
All the systems included in a solution must first pass firmware, homogeneity, sanity and setup checks before configuration. Next, the following general performance tests are performed: HPL (High Performance Linpack) to test the computing power of a machine, measured in FLOPs; STREAM to test the memory access bandwidth, measured in MB/s; FIO to test disk access speed, measured in MB/s and IOPS. Once the solution is ready, further tests are carried out in agreement with the client for the verification of system function and comparison with expected performance.
TESTED
Individual components are tested through burn in tests (developed by E4) lasting up to 120 hours to guarantee a unique, perfectly engineered and functioning system, reducing both the DoA (Dead on Arrival) and the ” early failure” rate after release. This significantly improves the overall reliability of E4’s HPC solution.
SERVICED
E4 is currently amongst only a few companies that delivers high-end services not only to both the academic and industrial sectors, but also to research centres of national and international importance, with which it has collaborated on the design, configuration and execution of solutions containing thousands of nodes for high performance and complex processing.
E4 is therefore able to provide clients with whatever level of support and advice they need, regardless of the size or complexity of their system, guaranteeing them perfect efficiency indefinitely:
• Level 1 & Level 2 Support, onsite service, 24/7 support
• Onsite specialists, infrastructure evolution, on-demand performance tuning.
Add-on Modules | New features coming soon
In addition to the features currently offered by its basic configuration, Medooza can be enriched with a series of add-ons. Some are integrable and others are available on demand. Contact us for their release date.
ADDITIONAL OPTIONS – ON DEMAND
- Advanced Slurm for accounting, billing, QoS, FairShare, etc.
- Backup: implementation of solutions based on FOG/Bacula
- Disaster Recovery: implementation of solutions based on StorWare
- ICE4HPC: provides web-based access to compute resources that can be used interactively through workload manager services (SLURM) configured to optimize the user experience for both interactive users and those submitting jobs to batch queues from the command line.
Click here to learn more
E4HPC-PLATFORM | ADDITIONAL MODULES
CUBEVIEW
Single and multi-cluster responsive mobile web interface centralized and adaptive for the management of a data center and its software applications.
TARDIS
Storage on demand system, a tool for the definition and automated creation of storage environments (Beegfs parallel storage, Object Storage, CEPH, NFS to S3 gateway)
TALOS
System for centralized management of the cluster. CLI interface for centralized management of Cluster nodes for configuration, monitoring and diagnostics activities.
HIGH PERFORMANCE COMPUTING