The goals of the 2-day hands-on workshop was for current or prospective users to familiarize with the 64-bit Arm®v8 programming environment by porting their applications or to further optimize the already ported applications.
Free access was provided to two 64-bit Arm®v8 based clusters:
• ARMIDA (ARM Infrastructure for the Development of Applications) cluster, 8 dual socket nodes based on Marvell® ThunderX2® processors located at E4 Computer Engineering’s premises
• JUAWEI cluster, 28 dual socket nodes based on the Huawei Hi1616 processors located at Jülich Supercomputing Centre
Ollie Perks, Arm
I was fortunate to be a technical tutor of the CODES@OEHI hackathon. I really believe that the hackathon was a great opportunity for me to share my knowledge and educate users on the Arm HPC offering, but also to learn from users on what they are looking for in term of their programming needs and optimization needs. The interesting thing is that, as a tutor of this hackathon, I was surprised that the participants were so quick in learning the optimization techniques and implementing these techniques to their applications. Codes were ported in matter of hours and in some cases of minutes, and profiling was quickly done. The optimization of the code was a bit more demanding in term of time, but the skills and focus of the participants made up to achieve a good level of performance just in the first session of the hands-on. This event also helped identify a few performance issues with the Arm HPC tools that we were able to feed back to the development team, to improve the tools for future users.
Phil Ridley, Arm
I was really impressed by the talents of the participants, especially by their in-depth knowledge of the programming tools. All of them were aware of how to use the development tools and the right framework. The primary objective of the CODES@OEHI wasn’t to achieve top performance with the applications but to familiarize the participants with the programming environments and tools. However, the majority of participants were able to obtain performance with their applications at a level which was as good as, or even better than, the systems available to them in their normal working environment.
Carlo Cavazzoni, CINECA
CINECA has a long history in using 64-bit Arm®v8 based clusters. In 2015 CINECA and E4 co-developed the first ARM+GPU cluster within the PRACE-3IP PCP: Whole-System Design for Energy Efficient HPC program. In 2018/2019 CINECA performed extensive tests on the HPC cluster CARMEN (Cineca ARM ENablement), based on 8 dual socket Marvell® ThunderX2® nodes connected via EDR Infiniband High Speed switch. Access to the cluster was provided to application developers and users of scientific and engineering workloads (OpenFOAM, VASP, QuantumEspresso, GROMACS, Lattice Boltzmann codes and many others). The particular interest of CINECA towards 64-bit Arm®v8 based clusters is for preparing and enabling the transition to exascale of the flagship codes and workflows used by the material science community, in line with the membership of CINECA in MaX, one of the nine ‘European Centres of Excellence for HPC applications’.
Cosimo Damiano Gianfreda, E4 Computer Engineering SpA
E4 Computer Engineering is always at the leading edge of the technology curve, and is honored to have supported the OEHI and CINECA in the hackathon. E4 has designed its first Arm-based cluster in 2010 and is currently refreshing its line of products based on 64-bit Arm®v8 processors to add the next-gen of the ThunderX family. The invaluable data gathered during the hackathon enables E4 to better define the specs of its products by applying a co-design approach targeted to achieve the optimal configuration for any demanding scientific and industrial requirements.
Enrico Calore, INFN
The CODES@OEHI Hackathon has been very useful to me for two main reasons: it allowed me to have access to two 64-bit Arm-v8 based machines and I had the opportunity to talk to experts, who have in-depth knowledge of this architecture, and getting their support in optimizing and running my codes on these systems.
Marvell® ThunderX2® is the second generation of the company's Arm®v8 based server processors supporting dual socket configurations and optimized to deliver the highest computational performance along with balanced IO connectivity, memory bandwidth and capacity. The Marvell® ThunderX2® processor family is fully compliant with Arm®v8–A architecture specifications and is optimized to drive high computational performance by delivering outstanding bandwidth and memory capacity. These features create an environment that is well suited to run computationally intensive HPC workloads.