An Engineer Recalls How AI Broke the Exascale Barrier


Thorsten Kurth continue to remembers the night he uncovered his staff broke the exascale barrier.

On the couch at house at nine p.m., he was pouring about the most up-to-date final results from one of the initially huge work opportunities operate on Summit, then the world’s prime supercomputer, based at Oak Ridge National Laboratory.

The 12-human being crew had invested evenings and weekends in search of a way that AI could track hundreds of hurricanes and atmospheric rivers buried in terabytes of historic climate data.

Only a couple of months previously, their program failed to operate on more than 64 of the system’s nodes.

But this time — just two days in advance of a paper on the operate was due — it exercised four,560 of Summit’s 4,608 nodes to deliver the outcomes. In the approach, it achieved 1.13 exaflops of mixed-precision AI functionality.

“That was a very good emotion, a lot of hard do the job paid out off,” recalled Kurth of the work he led while at Lawrence Berkeley Lab in 2018.

Moving into the Exascale Period

Nowadays, we celebrate the operate of all people who’s cracked a quintillion operations for each second.

Which is a billion billion or 10 to the 18th energy. Which is why we mark Exascale Day on Oct. 18.

About the identical time Kurth’s team was completing its do the job, researchers at Oak Ridge also entered the exascale period, hitting one.8, then two.36 exaflops on Summit, examining genomics to greater recognize the nature of opioid habit.

COVID-19 Ignites Exascale Work

Because then, numerous many others have pushed the limits of science with GPUs.

In March 2020, the Folding@household job place out a call for donations of free cycles on property computer systems to operate study examining the COVID-19 virus.

10 days later on their digital, dispersed process surpassed 1.five exaflops, creating a group-sourced exascale supercomputer fueled in section by much more than 356,000 NVIDIA GPUs.

AI Supercomputing Goes Worldwide

Today, educational and commercial labs globally are deploying a new generation of accelerated supercomputers capable of exascale-course AI.

The hottest is Polaris, a method Hewlett Packard Enterprise (HPE) is creating at Argonne Nationwide Lab capable of up to 1.4 AI exaflops. Researchers will use it to progress most cancers solutions, check out clean up strength and thrust the limits of physics, get the job done that will be accelerated by two,240 NVIDIA A100 Tensor Core GPUs.

An additional potent system stands on the campus of the College of California at Berkeley. Perlmutter utilizes 6,159 A100 GPUs to deliver nearly 4 exaflops of AI performance for additional than 7,000 researchers functioning on jobs that consist of drawing the most significant 3D map of the obvious universe to date.

Polaris and Perlmutter also use NVIDIA’s software program instruments to assist researchers prototype exascale programs.

Europe Erects Exascale AI Infrastructure

Atos will create an even more substantial AI supercomputer for Italy’s CINECA exploration centre. Leonardo will pack 14,000 A100 GPUs on an NVIDIA Quantum 200Gb/s InfiniBand network to hit up to 10 exaflops of AI functionality.

It’s one of eight programs in a regional network that backers connect with “an engine to power Europe’s details financial system.”

A person of Europe’s premier AI-able supercomputers is slated to come on-line in Switzerland in 2023. Alps will be crafted by HPE at the Swiss National Computing Center applying NVIDIA GPUs and Grace, our 1st data center CPU. It is anticipated to scale to heights up to 20 AI exaflops.

An Industrial HPC Revolution Starts

The transfer to high-performance AI extends past academic labs.

Improvements in deep mastering merged with the simulation engineering of accelerated computing has set us at the beginnings of an industrial HPC revolution, mentioned NVIDIA founder and CEO Jensen Huang in a keynote previously this yr.

Selene Exascale Day AI
Selene uses a modular architecture based on the NVIDIA DGX SuperPOD

NVIDIA was an early player in this pattern.

In the first times of the pandemic, we commissioned Selene, at this time ranked as the world’s quickest industrial supercomputer. It aids teach autonomous vehicles, refine conversational AI techniques and a lot more.

In June, Tesla Inc. unveiled its possess industrial HPC program to teach deep neural networks for its electrical autos. It packs 5,760 NVIDIA GPUs to provide up to one.8 exaflops.

Over and above the Numbers

A few a long time following winning a Gordon Bell award for breaking the exascale barrier, Kurth, now a senior software program engineer at NVIDIA, sees the serious fruit of his team’s labors.

Improved versions of the AI design they pioneered are now obtainable on the internet for any climate scientist to use. They cope with in an hour what used to just take months. Governments can use them to system budgets for disaster reaction.

In the stop, Exascale Day is all about the persons, because to succeed at this amount, “you have to have an excellent workforce with experts who fully grasp just about every part of what you are seeking to do,” Kurth mentioned.

Leave a comment

Your email address will not be published.