AI Wave, Driving Re-evolution of Storage Chips

2024-03-02 11:53

Semiconductor cross-domain ecosystem constructed by utilizing advanced packaging to support various computing needs through Chiplet, HBM, etc.

  The digitization of the global industry, the rise in the scale of digital data, and the emergence of AI technology have led to a rapid growth in the global demand for data processing, big data analytics, and AI applications, which has indirectly increased the demand for hardware devices and chips that support high-performance computing (HPC) and AI computing. In the case of cloud data center servers, the chips that need to be upgraded to meet the demands of HPC and AI computing include central processing units (CPUs) and graphics processing units (GPUs) for computing, server board management chips (BMCs), power management chips (PMICs), high-speed transmission chips, and storage.

  Among them, storage includes NAND Flash solid state drives (SSDs), which are non-volatile storage used for long-term data storage, as well as static random access memory (SRAM) and dynamic random access memory (DRAM), which are volatile storage used for instantaneous high-speed computing to temporarily store data.

  The main role of storage in the chip computing process is to temporarily store intermediate values or parameters in the computing process. The traditional temporary storage can be differentiated into internal flash (Cache) storage and externally connected DRAM, and with the continuous improvement of computing performance, the chip requires higher capacity and data access rate for both internal and external storage, especially for internal Cache storage. With the limited space size of the package, the integration of small chips (Chiplet) into a higher density stack within a single chip through advanced packaging has become an important option to increase the internal storage capacity of the chip.

  The development of advanced packaging technology is aimed at the continuous improvement of chip computing performance and functionality needs, through the intermediary layer, silicon perforation and micro-bump technology to achieve 2.5D/3D chiplet stacking, enabling the industry to achieve more computing units and chip functionality integration in a smaller space. AMD's Ryzen 7 5800X3D chip is an example of the integration of storage chips and CPU stacking: by stacking a 64MB SRAM storage chip on top of the CPU, the CPU's original 32MB of Cache storage is expanded to 96MB, which increases the CPU's computing performance by 15%.

  However, for high-level GPU chips used for HPC or AI computing, such as NVIDIA's H100 and Supermicro's MI300, the main computing architecture is based on GPU computing paired with high-bandwidth storage (HBM) that allows for fast and massive access to the data being transmitted, and the two are integrated and connected at the intermediary layer through advanced packaging technology, i.e. TSMC's CoWoS 2.5D packaging technology.

  HBM was developed by Supermicro in cooperation with storage makers SK Hynix, UMC, and Sun Micron, and SK Hynix mass-produced generation HBM (HBM1) in 2015, which was imported into Supermicro's Radeon Rx300 GPU chip. Subsequently, South Korea and storage major Samsung Electronics and Micron Technology also invested in HBM development. Its main structure is a vertical stack of high-capacity storage formed by multiple layers of DRAM storage chips, with the lower layer being the control chip of the HBM. In the stack, the signals between the upper layer of DRAM and the lower layer of DRAM are connected through microbumps, and the signals from the upper layer of DRAM can pass through the silicon vias of the lower layer of DRAM to connect with the lower layer of DRAM or even the lower layer of control chip, and then pass down to the substrate. The short distance of vertical stacking ensures fast and low-power signal transmission between layers, which indirectly improves computing performance.

  Under the CoWoS architecture, GPU computing can be combined with multiple HBM stacks. Currently, the world has developed to the HBM3 specification, which increases the number of HBM stacks, the number of vertically stacked layers, and the number of inter-layer signal connection channels; for example, from HBM2 to HBM3, the number of stacks can be increased from eight to 16, which can effectively increase the storage capacity and access and transmission rate.

  HBM is mainly used with GPUs, which are high computing performance chips. The main structure of HBM itself is made of advanced 3D stacked packages, and then integrated with GPU computing by CoWoS advanced packages to form a complete GPU chip. If the GPU is not produced using advanced processes below 7nm, it is a high unit cost product, and the production cost of the chip to integrate HBM with advanced packaging is difficult to bear. In the case of the Supermicro Ryzen 7 5800X3D chip, a small SRAM chip is stacked on top of the CPU, and in order to increase the storage capacity, the SRAM also needs to be produced by advanced processes, which is costly.

  To meet the medium computing power requirements of AIoT applications, some semiconductor companies have proposed a solution of pairing non-advanced process computing chips with customized DRAM storage, vertically stacking the storage and computing chips in a 3D package. The so-called customized DRAM storage is based on the distribution of contact electrodes in the circuitry and internal wiring of the op-chip to design the circuitry of the DRAM chip and the location of the data access transmission channel, so that the op-chip and the vertically stacked DRAM chip can have high efficiency data access transmission to enhance computing performance. Operational chips are mainly single chip (SoC) or special application chips (ASIC) required for AIoT applications, while DRAM's higher storage density than SRAM allows DRAM chips to have a capacity equivalent to that of SRAM chips without adopting advanced processes, which is also a cost advantage.

  Some storage operators cooperate with foundry operators, packaging and testing operators, and IC designers to build a solution platform to complete the complete design of ASIC, DRAM, and their packaging connection and heat dissipation in accordance with the application requirements. Both ASICs and DRAMs are produced using mature processes, which reduces the cost compared to the combination of HBM, SRAM, and advanced process computing chips, and can meet the cost structure requirements of application developers.

  In response to the increasing number of AI applications, storage in the form of small chips or HBM and other different formats can be formed through advanced packaging technology and computing chips to form a single chip package to support different types of computing needs, which also contributes to the development of the semiconductor industry chain's ecosystem of cross-field diversified integration.