AMD Demos 3D Stacked Ryzen 9 5900X: 192MB of L3 Cache at 2TB/s

amd-demos-3d-stacked-ryzen-9-5900x:-192mb-of-l3-cache-at-2tb/s

This internet site may possibly earn affiliate commissions from the one-way links on this website page. Terms of use.

The Computex trade show has kicked off in Taiwan and AMD opened the demonstrate with a bang. Past 7 days, we reviewed rumors that AMD was preparing a Milan-X SKU for start later this calendar year. The Zen three-based CPU would supposedly provide onboard HBM and a 3D-stacked architecture.

We really don’t know if AMD will bring Milan-X to market in 2021, but the organization has now demonstrated off 3D die stacking in an additional way. During her Computex keynote, Lisa Su confirmed a 5900X with 64MB of SRAM, integrated on major of the chiplet die. This is in addition to the L3 cache now built-in into the chiplet by itself, granting a whole of 96MB of L3 for every chiplet, or 192MB for every 5900X with two chiplets. The dies are connected with by way of-silicon vias (TSVs). AMD promises bandwidth of in excess of 2TB/s. That is higher than Zen 3’s L1 bandwidth, nevertheless obtain latencies are a lot larger. L3 latency is ordinarily concerning 45-50 clock cycles, in contrast with a four-cycle latency for L1.

The new “V-Cache” die isn’t just the exact same measurement as the chiplet down below it, so there’s some further silicon used to be certain there’s equal tension throughout the compute die and cache die. The 64MB cache is stated to be a bit significantly less than 50 % the size of a typical Zen three chiplet (80.seven mm sq).

This substantially L3 on a CPU is alternatively nutty. We can not look at versus desktop chips, for the reason that Intel and AMD have never delivered a CPU with this a great deal cache dedicated to such a tiny selection of cores. The closest analog on transport CPUs would be some thing like IBM’s POWER9, which gives up to 120MB of L3 for each chip — but all over again, not practically this significantly for each core. 192MB of L3 for just 12 cores is 16MB of L3 for each main and 8MB for each thread. There are also more than enough differences between POWER9 and Zen 3 that we cannot really look to the IBM CPU for a lot on how the extra cache would improve general performance, although if you are curious about the x86-as opposed to-non-x86 question in common, Phoronix did a evaluation with some benchmarks back in 2019.

Absent an relevant CPU to refer to, we’ll have to just take AMD’s word on some of these figures. The firm as opposed a standard 5900X (32MB of L3 cache for every chiplet, 64MB full) to a modified 5900X (96MB of L3 cache for every chiplet, 192MB complete) in Gears of War 5 ( 12 percent, DX12), DOTA2 ( 18 per cent, Vulkan), Monster Hunter Environment ( 25 %, DX11), League of Legends ( four percent, DX11), and Fortnite ( 17 per cent, DX12). If we established LoL apart as an outlier, that is an 18 p.c common boost. If we consist of it, it’s a 15.2 % normal uplift. Equally CPUs were being locked at 4GHz for this comparison. The GPU was not disclosed.



That uplift is nearly as significant as the median generational advancement AMD has been turning in the previous couple many years. The additional attention-grabbing problem, nonetheless, is what form of affect this strategy has on ability use.

AMD Has Big Caches on the Brain

It is noticeable that AMD has been accomplishing some work all-around the strategy of slapping massive caches on chips. The large “Infinity Cache” on RDNA2 GPUs is a central part of the layout. We’ve listened to about a Milan-X that could theoretically deploy this sort of technique and on-package HBM.

One way to seem at news of a 15 % efficiency advancement is that it would enable AMD to pull CPU clocks from a leading clock of, say, 4.5GHz down to about 4GHz at equal performance. CPU ability use increases extra speedily than frequency does, primarily as clocks method 5GHz. Enhancements that permit AMD (or Intel) to hit the similar performance at a reduced frequency can be useful for increasing x86’s energy consumption at a supplied clock velocity.

About six weeks in the past, we protected the roadmap leak/rumor previously mentioned. At the time, I speculated that AMD’s rumored Ryzen 7000 family members may well integrate an RDNA2 compute unit into each chiplet, and that this chiplet-degree integration may possibly be the rationale why RDNA2 is mentioned in eco-friendly for Raphael but orange for the hypothetical Phoenix.

What I am about to say is speculation stacked on best of speculation and must be addressed as this kind of:

For several years, we have waited and hoped that AMD would carry an HBM-geared up APU to desktop or mobile. So significantly, we have been dissatisfied. A chiplet with a 3D-mounted L3 stack tied to both equally the CPU and GPU could provide a nifty alternate to this concept. When we nevertheless have no notion how big the GPU main would be, boosting the general performance of an integrated GPU with onboard cache is a tried using-and-correct way of executing items. It is assisted Intel strengthen general performance on many SKUs since Haswell.

The bit previously mentioned, as I claimed, is pure speculation, but AMD has now acknowledged doing work extensively with massive L3 caches on each CPUs (through 3D stacking) and GPUs (by using Infinity Cache). It’s not insane to believe the company’s future APUs will continue on the craze in a single form or one more.

Now Study:


This website may gain affiliate commissions from the inbound links on this page. Conditions of use.

Leave a comment

Your email address will not be published.


*