SRAM Chip:A New Cornerstone of Computing Power in AI 2.0 Era
In the first wave of artificial intelligence,GPU became the absolute protagonist by virtue of its powerful parallel computing ability,and NVIDIA not only topped the list of global chip companies,but also won the highest market value company in the world.However,when the AI industry enters the second stage with reasoning and large-scale deployment as the core,the storage bottleneck is replacing the shortage of computing power as a new challenge-a new chip architecture with static random access memory(SRAM)as the core begins to move from behind the scenes to the front.
Although GPU is good at large-scale data throughput,in the AI reasoning scenario,how to efficiently retain the calculated model parameters has become the key to determine the system performance.The"GPU memory wall"often mentioned in the industry refers to this hard constraint:there is an upper limit on the capacity used to cache the historical key value(KV)in the reasoning process,which directly limits the length of the context window,lengthens the response delay,and also compresses the load of online users at the same time.The smaller the KV cache,the worse the user experience,which is the most realistic pain point for the current large-scale deployment of AI services.
The most direct way to break through the bottleneck of KV cache is to stack high-speed memory as much as possible within the scope of physical space and cost.Although the high-bandwidth memory(HBM)carried by NVIDIA and AMD graphics cards can provide considerable data cache for GPU computing,HBM belongs to off-chip storage,and its bandwidth is naturally limited by interfaces and traces.In contrast,SRAM,as the fastest storage medium at present,is directly integrated into the chip,and its memory bandwidth can reach 100,150 terabytes per second(TBPS).For reference,the bandwidth of single-stack HBM3 is only 1.2TBps,and even the new generation HBM4 is only 2 tbps-the difference between them is two orders of magnitude.
SRAM has excellent performance,but its cost is high.Traditionally,it is only used for registers and L1/L2/L3 caches inside the chip.Dynamic random access memory(DRAM)has a lower speed and lower cost,and has occupied the mainstream position of equipment main memory for a long time.However,as the demand for AI computing power continues to break through the traditional computing boundary,the design logic of storage architecture is reversing-many chip manufacturers have begun to develop solutions with SRAM as the main memory,and Groq is one of the pioneers.In December last year,NVIDIA completed the acquisition of Groq for$200 billion;The language processing unit(LPU)developed by Groq deeply integrates the vector and matrix computing unit and the large-capacity SRAM on the same chip.NVIDIA has rapidly promoted technology integration,and officially launched the LPX complete machine rack equipped with the third generation Groq LPU at the GTC conference in March this year.
CONTACT US
USA
Vilsion Technology Inc.
36S 18th AVE Suite A,Brington,Colorado 80601,
United States
E-mail:sales@vilsion.com
Europe
Memeler Strasse 30 Haan,D 42781Germany
E-mail:sales@vilsion.com
Middle Eastern
Zarchin 10St.Raanana,43662 Israel
Zarchin 10St.Raanana,43662 Israel
E-mail:peter@vilsion.com
African
65 Oude Kaap, Estates Cnr, Elm & Poplar Streets
Dowerglen,1609 South Africa
E-mail:amy@vilsion.com
Asian
583 Orchard Road, #19-01 Forum,Singapore,
238884 Singapore
238884 Singapore
E-mail:steven@vilsion.com
