Ampere only launched 6 months back, but Nvidia is upgrading the best-conclude model of its GPU to offer even more VRAM and noticeably a lot more bandwidth. The A100 (80GB) keeps most of the A100 (40GB)’s specifications: 1.41GHz boost clock, 5120-bit memory bus, 19.5 TFLOPs of single-precision, NVLink 3 aid, and its 400W TDP are all unchanged from the previous iteration of the GPU. Both equally chips also aspect six,192 GPU cores.
What is diverse is the highest amount of money of VRAM (80GB, up from 40GB) and the whole memory bandwidth (3.2Gbps HBMe, somewhat than two.4Gbps HBMe). Bandwidth across the overall HBM2 array is 2TB/s, up from 1.6TB/s. This is a strong upgrade — it wouldn’t have been unconventional for Nvidia to lower the memory bandwidth of the array in buy to double the potential. Instead, the firm boosted the full bandwidth by one.25x.
The A100 functions six stacks of HBM2, as you can see in the picture higher than, but Nvidia disables one of the stacks to improve produce. The remaining 5 stacks just about every have a 1024-bit memory bus, which is the place the 5120-little bit bus figure will come from. Nvidia replaced the HBM2 on the 40GB A100 with HBM2E, which permitted it to considerably update the base specs.
The 80GB taste really should gain workloads that are the two ability-limited and memory bandwidth sure. Like the 40GB variant, the A100 80GB can assistance up to seven components occasions with up to 10GB of VRAM committed to each and every.
Nvidia is providing these GPUs in mezzanine cards anticipated to be deployed in possibly an HGX or a DGX configuration. Clients who want an specific A100 GPU in a PCIe card are nevertheless minimal to the 40GB variant, nevertheless this could transform in the upcoming.
The rate tag on a server whole of 80GB A100 playing cards is going to be firmly in “if you have to inquire, you simply cannot pay for it” territory. But there’s a motive organizations on the reducing edge of AI enhancement might pay out so a lot. GPU design complexity is limited by onboard memory. If you have to touch main process memory, all round overall performance will crater — CPUs may have the variety of DRAM capacities that AI researchers would really like for their models, but they just can’t give the vital bandwidth (and CPUs are not terrific for modeling neural networks in any scenario). Increasing the overall pool of onboard VRAM could allow developers to boost the complete complexity of the model they are education or to tackle complications that couldn’t previously suit into a 40GB VRAM pool.
- Nvidia Unveils Its First Ampere-Centered GPU, Raises Bar for Information Heart AI
- Microsoft Deploys AI ‘Supercomputing’ by using Nvidia’s New Ampere A100 GPU
- Nvidia Crushes New MLPerf Checks, but Google’s Long run Seems Promising