Wafer Scale Engine (WSE)

Technology

A category of extremely large AI chips developed by Cerebras Systems, containing trillions of transistors, designed to process information much faster than traditional GPUs for AI tasks.

First Mentioned

1/24/2026, 3:34:12 AM

Last Updated

1/24/2026, 3:35:49 AM

Research Retrieved

1/24/2026, 3:35:49 AM

Summary

The Wafer Scale Engine (WSE) is a revolutionary semiconductor technology developed by Cerebras Systems that utilizes wafer-scale integration (WSI) to create the world's largest single integrated circuits. By utilizing an entire silicon wafer to produce a single 'super-chip,' the WSE overcomes the communication and memory bottlenecks inherent in traditional multi-chip GPU clusters. Architecturally, it features hundreds of thousands of independent processing elements interconnected in a 2D mesh, providing up to 20 PB/s of aggregate memory bandwidth and keeping all model parameters on-chip for ultra-low latency. Positioned as a direct competitor to Nvidia's GPUs, the WSE is specifically optimized for deep learning workloads and AI inference. The technology has gained significant commercial traction, including a major purchase order from OpenAI, and is central to the geopolitical race for AI dominance between the US and China.

Referenced in 1 Document

Document 67dd679b...

Research Data

Extracted Attributes

Architecture
Spatially distributed mesh of processing elements (PEs)
Manufacturer
Cerebras Systems
Surface Area
46,225 square millimeters
On-chip Memory
18 Gigabytes
Memory Bandwidth
9 to 20 Petabytes per second
Transistor Count
1.2 trillion (WSE-1)
Primary Application
Deep learning training and low-latency AI inference
Manufacturing Process
TSMC 7nm (WSE-2)

Timeline

Cerebras Systems unveils the first Wafer Scale Engine, the industry's first trillion-transistor chip. (Source: Cerebras Systems Unveils the Industry's First Trillion Transistor Chip)
2019-08-19
Specifications for WSE-3 are updated, highlighting a 46,000 mm² wafer size and research into scaling cost-effectiveness. (Source: Cerebras Wafer-Scale Engine Overview - Emergent Mind)
2025-09-10
CEO Andrew Feldman discusses the WSE at the World Economic Forum in Davos, revealing a major purchase order from OpenAI and positioning the chip as a key asset in the US-China AI race. (Source: Document 67dd679b-d764-4b4b-b23b-46e6c18ea056)
2026-01-20

Wikipedia

View on Wikipedia

Wafer-scale integration

Wafer-scale integration (WSI) is a system of building very-large integrated circuit (commonly called a "chip") networks from an entire silicon wafer to produce a single "super-chip". Combining large size and reduced packaging, WSI was expected to lead to dramatically reduced costs for some systems, notably massively parallel supercomputers but is now being employed for deep learning. The name is taken from the term very-large-scale integration, the state of the art when WSI was being developed.

Web Search Results

Cerebras Wafer-Scale Engine Overview - Emergent Mind
The Cerebras Wafer-Scale Engine (WSE) is a specialized, massively parallel computational platform in which hundreds of thousands to nearly a million processing elements (PEs) are integrated onto a single silicon wafer, forming a spatially distributed mesh of compute, memory, and communication resources. Engineered to address bandwidth and latency bottlenecks endemic to traditional clustered CPU and GPU systems—especially for workloads limited by sparse communication, memory traffic, and arithmetic intensity—the WSE exemplifies the confluence of ultra-low latency interconnects, on-chip SRAM, and hardware-accelerated dataflow. Architecturally, each core is paired with its own local memory and a router module supporting bidirectional neighbor connections, collectively enabling [...] 2000 character limit reached # Cerebras Wafer-Scale Engine Overview Updated 10 September 2025 Cerebras Wafer-Scale Engine is a massively parallel, monolithically integrated platform featuring hundreds of thousands of processing elements with ultra-low latency interconnects. It employs an event-driven, dataflow programming paradigm that overlaps communication and computation using local SRAM and dedicated router modules. It achieves high-throughput performance for scientific simulations and AI tasks, enabling efficient stencil PDEs, FFTs, molecular dynamics, and large language model training. [...] Manufacturing, thermal management, and packaging challenges are nontrivial given wafer size (46,000 mm² for WSE-3), defect tolerance, and power removal requirements. Yield optimization via small core size mitigates defect area loss. Future research focuses on scaling cost-effectiveness, reliability, and practical deployment in domain-specific and general AI workloads. ## 7. Applications in Scientific and Artificial Intelligence Domains The WSE’s capabilities are leveraged across multiple domains:
Cerebras Wafer Scale Engine: Why we need big chips for Deep ...
To meet the growing computational requirements of AI, Cerebras has designed and manufactured the largest chip ever built. The Cerebras Wafer Scale Engine (WSE) is 46,225 millimeters square, contains more than 1.2 trillion transistors, and is entirely optimized for deep learning workloads. By way of comparison, the WSE is more than 56X larger than the largest graphics processing unit (GPU), containing 3,000X more on chip memory and more than 10,000X the memory bandwidth. But why do we need a big chip? Why not just tie together lots of smaller chips? [...] The Cerebras Wafer Scale Engine is dedicated towards accelerating both deep learning calculation and communication, and by so doing, is entirely optimized for reducing training time. The approach is a straightforward function of the size of the WSE. By building a wafer-scale chip and keeping everything on a single piece of silicon, we can avoid all the performance pitfalls of slow off-chip communication, distant memory, low memory bandwidth, and wasting computational resources on useless work. We can deliver more cores optimized for deep learning primitives; more local memory close to cores for efficient operation; and more high-performance, low-latency bandwidth between cores than can be achieved by off-chip interconnects. In other words, the WSE achieves cluster-scale performance on a [...] Cerebras has solved this problem. The WSE has 18 Gigabytes of on chip memory and 9.6 Petabytes of memory bandwidth — respectively, 3,000x and 10,000x more than is available on the leading GPU. As a result, the WSE can keep the entire neural network parameters on the same silicon as the compute cores, where they can be accessed at full speed. This is possible because memory on the WSE is uniformly distributed alongside the computational elements, allowing the system to achieve extremely high memory bandwidth at single-cycle latency, with all model parameters in on-chip memory, all of the time.
A Conceptual View — SDK Documentation (1.4.0)
The Cerebras Wafer-Scale Engine (WSE) is a wafer-parallel compute accelerator, containing hundreds of thousands of independent processing elements (PEs). The PEs are interconnected by communication links into a two-dimensional rectangular mesh on one single silicon wafer. Each PE has its own memory (used by it and no other) and its own program counter. It has its own executable code in its memory. 32-bit messages, called wavelets, can be sent to or received by neighboring PEs in a single clock cycle.
Wafer-Scale AI Compute: A System Software Perspective - acm sigops
Figure 6: Scaling efficiency comparison for wafer-scale and multi-GPU systems. For LlaMA3-8B decoding, the wafer-scale system (WSE-2) achieves 2,700 tokens/s, significantly outperforming an 8-GPU A100 server (260 tokens/s). The world-record results highlight the wafer-scale system’s superior performance and its improved energy efficiency at scale, avoiding the communication bottlenecks of multi-chip setups. [...] Wafer-scale systems, by contrast, provide orders of magnitude higher memory bandwidth. For example, the Cerebras WSE-2 chip delivers about 20 PB/s of aggregate bandwidth, representing the total sum of local memory bandwidth of its massive number of cores. This total sum far exceeds that of HBM, which sacrifices overall bandwidth for a unified memory design. [...] With WaferLLM, we are now able to fully utilise a wafer-scale chip. The next question is whether wafer-scale computing truly fulfils its promise of providing more efficient scaling than today’s approaches, which coordinate multiple chips interconnected by high-speed networks such as NVLink and InfiniBand. We evaluated WaferLLM on a real Cerebras WSE-2 wafer-scale chip (TSMC 7nm) against state-of-the-art GPU-based inference systems, including SGLang and vLLM, deployed on NVIDIA A100 GPUs (also TSMC 7nm) interconnected via NVLink and InfiniBand.
Cerebras Systems Unveils the Industry's First Trillion Transistor Chip
The Cerebras Wafer Scale Engine includes more cores, with more local memory, than any chip in history. This enables fast, flexible computation, at lower latency and with less energy. The WSE has 18 Gigabytes of on-chip memory accessible by its core in one clock cycle. The collection of core-local memory aboard the WSE delivers an aggregate of 9 petabytes per second of memory bandwidth—this is 3,000 X more on-chip memory and 10,000 X more memory bandwidth than the leading graphics processing unit has. Communication Fabric [...] Cerebras Systems is a team of pioneering computer architects, computer scientists, deep learning researchers, and engineers of all types. We have come together to build a new class of computer to accelerate artificial intelligence work by three orders of magnitude beyond the current state of the art. The first announced element of the Cerebras solution is the Wafer Scale Engine (WSE). The WSE is the largest chip ever built. It contains 1.2 trillion transistors and covers more than 46,225 square millimeters of silicon. The largest graphics processor on the market has 21.1 billion transistors and covers 815 square millimeters. In artificial intelligence work, large chips process information more quickly producing answers in less time. As a result, neural networks that in the past took [...] With an exclusive focus on AI, the Cerebras Wafer Scale Engine accelerates calculation and communication and thereby reduces training time. The approach is straightforward and is a function of the size of the WSE: With 56.7 times more silicon area than the largest graphics processing unit, the WSE provides more cores to do calculations and more memory closer to the cores so the cores can operate efficiently. Because this vast array of cores and memory are on a single chip, all communication is kept on-silicon. This means the WSE’s low-latency communication bandwidth is immense, so groups of cores can collaborate with maximum efficiency, and memory bandwidth is no longer a bottleneck.

Wafer Scale Engine (WSE)

First Mentioned

Last Updated

Research Retrieved

Summary

Referenced in 1 Document

Research Data

Extracted Attributes

Architecture

Manufacturer

Surface Area

On-chip Memory

Memory Bandwidth

Transistor Count

Primary Application

Manufacturing Process

Timeline

Wikipedia

Wafer-scale integration

Web Search Results