The ANDREAS project is part of TETRAMAX, sponsored by Horizon2020 within the Smart Anything Everywhere (SAE) project, which deals with computing and low energy consumption for Cyber Physical Systems and Internet of Things. E4 is the protagonist of the project, together with the Italian Politecnico di Milano and the Polish 7bulls.
ANDREAS (Artificial intelligence traiNing scheDuler foR accElerAted resource clusterS) aims to meet two key market needs: efficiency in the use of high computational value resources (GPU, FPGA) and reduction of energy consumption, necessary to complete the required processing (Energy to Solution). Today, Artificial Intelligence (AI) and Deep Learning (DL) are used for a wide range of applications and are supported by different hardware and software platforms, based on GPUs and FPGAs. The wide and increasingly sophisticated use of Artificial Intelligence models creates the need for improvements in the management of the energy footprint, during training and retraining operations, and for different types of deployments: from on-premises systems to medium-sized infrastructures (such as European cloud operators and large HPC centers), to large suppliers of edge/fog systems.
ANDREAS allows the reduction of the operational costs of DL/AI workloads running in hybrid infrastructure (GPU-based) through:
- The “Advance Planning” of resources through effective job scheduling algorithms
- The real-time interaction between the advance planning and the available resources
- Online and real-time evaluation of the optimal resource allocation strategy
Today, Artificial Intelligence and Deep Learning methods are used on a wide range of products. DL models are trained on heterogeneous system-based infrastructures (GPU-based clusters), achieving 5 to 40 times faster speeds than CPU-only-based servers.
The ability to optimize the use of the infrastructure and run the workload efficiently and under power constraints is critical. Local operators/data centers are subject to energy consumption quotas, while their economic results strongly depend on the efficiency with which the infrastructure is used. Although there are advanced solutions that allow you to manage virtual servers or containers, the growth of Machine Learning on servers equipped with GPUs represents a business opportunity for cloud/data center operators, but optimizing revenues requires maintaining energy consumption under tight quotas, maximizing the effectiveness of high-value assets such as GPU-based systems.
The requirement of achieving lower power consumption and better cost management is critical for SMEs and innovative startups that build their competitive advantage on Artificial Intelligence/Machine Learning and need to adapt to power constraints. The same requirements above apply to entities that rely on private clouds to meet their needs.
The ability to manage the energy footprint in new infrastructure projects, such as distributed computing done with GPUs, embedded in 5G networks, is critical for the telecommunications companies that own their infrastructure, for the companies that manage other infrastructures and for the SMEs that provide innovative solutions.
In all cases, energy footprint management and performance optimization are of fundamental importance to limit the emissions of pollutants and to exploit renewable energy sources that impose stricter constraints.
Despite the clear benefits of GPU-based clusters in terms of performance, these systems are characterized by high costs and a significant energy impact: for example, high-end GPU servers such as NVIDIA DGX A100 cost about $ 200,000 with energy consumption not always proportional to workload.
To increase the ability to share expensive and specialized resources, the datacenters are moving away from traditional technologies (GPUs installed locally in the single servers) towards a software-based architecture, on an intelligent and self-learning paradigm for resource allocation. In this way, resource usage can be maximized by assigning the adequate quantity of resources for more remote training jobs, minimizing energy consumption.
ANDREAS: product’s features
ANDREAS makes available an advanced planning solution that optimizes the runtime workloads of Deep Learning training and their energy consumption in accelerated clusters.
As a value of reference, depending on the Deep Learning models used, is expected at least a 2-3x acceleration and 50%-70% energy savings.
ANDREAS meets 3 requirements:
- reduce the power consumption of Artificial Intelligence/Machine Learning training jobs sent to a GPU-accelerated infrastructure;
- minimize the processing time (Time to Solution) of these jobs and the entire workload
- optimize the overall efficiency of the GPU-accelerated infrastructure
What is ANDREAS architecture?
- a queue manager SLURM,
- a pool of GPU-based servers,
- an intelligent module that monitors the energy consumption of the application and processes the performance forecast, in relation to the job scheduling.
How is it used?
The jobs are sent to SLURM and are characterized by a deadline and a priority (ie a weight). These are never rejected and can be delayed. The ultimate goal is to minimize the weighted delay of the job, given the power budget set by the SysAdmin.
To fully exploit resource efficiency, an intelligent system is included in the product that orchestrates resources with a global vision. This “intelligence” is provided by a Machine Learning model that predicts DL jobs execution time and the advanced scheduler.
The former is continuously trained during the execution of applications and provides assessments to predict work execution time and energy consumption. The latter defines what should be the joint allocation of capacity (i.e., how many GPUs to assign to a job) and the schedule (i.e., determines the order of work leading to optimal use of resources) and decides which resources (server/GPU) to use.
What is E4’s role in the project?
ANDREAS is a tool that, when a job is submittet to the system by a user, makes a discovery of the available infrastructure and its configuration at the level of RAM, CPU, GPU and other components. Once the analysis of the resources at its disposal has been made and according to the ‘queue’ of activities, it decides which resources to dedicate to that specific job, so that it performs its task in the shortest possible time and with the lowest energy consumption. ANDREAS is a containerized structure, each container performs specific functions. A Docker container with all the required libraries and drivers is used for the job and to manage the CPUs. On these libraries and drivers E4 has operated a further development, creating a container that optimizes the management of the resources of the system, primarily the GPUs.
For whom ANDREAS is ideal?
ANDREAS is aimed at all users of GPU-based clusters, who run training/Deep Learning models, and who can gain advantages deriving from the optimization of Time-To-Solution and Energy-To-Solution based on the variability of the workload.
Politecnico di Milano
Università di Milano Bicocca
Università di Bologna