Domain Specific Architectures

Technology

Custom-designed silicon optimized for specific workloads rather than general-purpose computing.

First Mentioned

6/7/2026, 2:17:39 AM

Last Updated

6/7/2026, 2:20:19 AM

Research Retrieved

6/7/2026, 2:20:19 AM

Summary

Domain Specific Architectures (DSAs) represent a paradigm shift in computer engineering, designing programmable hardware tailored to run highly efficiently within specific application domains rather than serving as general-purpose processors like traditional CPUs. As Moore's Law and Dennard Scaling plateau, DSAs have emerged as the primary path to continuing performance and energy efficiency gains. In modern technology ecosystems, DSAs are critical for accelerating Artificial Intelligence (AI) and deep learning workloads. Industry leaders like Andrew Feldman of Cerebras advocate for DSAs over legacy architectures from Intel, AMD, or Arm, enabling partners like OpenAI to execute AI inference at significantly higher speeds. Other examples of DSAs include Google's Tensor Processing Units (TPUs) and Graphics Processing Units (GPUs) originally designed for image processing.

Referenced in 1 Document

The IPO Comeback: Why Tech Giants Are Finally Going Public | All-In Liquidity IPO Panel

Research Data

Extracted Attributes

Contrast
General-purpose architectures, such as CPUs, designed to operate on any computer program.
Definition
A programmable computer architecture specifically tailored to operate very efficiently within the confines of a given application domain.
Primary Drivers
The plateauing of Moore's Law and the end of Dennard Scaling.
Primary Applications
Deep learning, artificial intelligence, autonomous systems, image processing, and high-performance computing.
Key Design Principles
Use of dedicated software-controlled memories, investment in more arithmetic units, domain-matched parallelism (SIMD/VLIW), and reduced data sizes (8/16-bit).

Timeline

The semiconductor boom begins, prompting computer architects to find new ways to exploit growing transistor counts, initially focusing on general-purpose microprocessors. (Source: Wikipedia)
1965-01-01
The end of Dennard Scaling forces a transition from single fast processors to multi-core processors, eventually driving the shift toward domain-specific specialization. (Source: Wikipedia)
2006-01-01
Eyeriss, an energy-efficient spatial architecture for deep convolutional neural networks, is presented at ISCA, showcasing research in domain-specific AI accelerators. (Source: https://eems.mit.edu/wp-content/uploads/2019/06/2019_isca_dsa.pdf)
2016-06-01
At the All-In Liquidity IPO Panel, Andrew Feldman highlights how Cerebras's Domain Specific Architectures allow partners like OpenAI to run inference significantly faster. (Source: 92b9a533-494f-4b4b-b8f6-dc55024f4366)
2024-01-01

Wikipedia

View on Wikipedia

Domain-specific architecture

A domain-specific architecture (DSA) is a programmable computer architecture specifically tailored to operate very efficiently within the confines of a given application domain. The term is often used in contrast to general-purpose architectures, such as CPUs, that are designed to operate on any computer program.

Web Search Results

Designing Efficient Domain-Specific Architectures for Autonomous Systems
design space exploration. This approach streamlines the process, efficiently navigating and pinpointing optimal solutions, thereby significantly reducing the complexity and time required in the design process. [...] changing AI model landscape. This scenario underscores the necessity to develop methodologies and tools that span from efficient training of AI models to their characterization and the creation of automated design methodologies for domain-specific architectures, for the effective deployment of these models in autonomous systems. [...] ## Journal Issue ## Citation ## Abstract The rapid development of deep learning models is driving a remarkable expansion in capabilities for a wide array of real-world applications, from smart sensors to autonomous systems like self-driving cars and aerial robots. These innovations bring the promise of unparalleled intelligence and autonomy. Yet, efficiently implementing these AI models in autonomous systems poses a significant challenge, a key to unlocking their full potential in practical applications. As Moore's Law begins to plateau, computer architects are increasingly focusing on domain-specific architectures to meet the evolving performance demands of these complex domains.
Domain-specific architecture - Wikipedia
Wikipedia The Free Encyclopedia ## Contents # Domain-specific architecture A domain-specific architecture (DSA) is a programmable computer architecture specifically tailored to operate very efficiently within the confines of a given application domain. The term is often used in contrast to general-purpose architectures, such as CPUs, that are designed to operate on any computer program. ## History In conjunction with the semiconductor boom that started in the 1960s, computer architects were tasked with finding new ways to exploit the increasingly large number of transistors available. Moore's Law and Dennard Scaling enabled architects to focus on improving the performance of general-purpose microprocessors on general-purpose programs. [...] These efforts yielded several technological innovations, such as multi-level caches, out-of-order execution, deep instruction pipelines, multithreading "Multithreading (computer architecture)"), and multiprocessing. The impact of these innovations was measured on generalist benchmarks such as SPEC, and architects were not concerned with the internal structure or specific characteristics of these programs. The end of Dennard Scaling pushed computer architects to switch from a single, very fast processor to several processor cores. Performance improvement could no longer be achieved by simply increasing the operating frequency of a single core. [...] A notable early example of a domain-specific programmable architecture are GPUs. These specialized hardware were developed specifically to operate within the domain of image processing and computer graphics. These programmable processing units found widespread adoption both in gaming consoles and personal computers. With the improvement of the hardware/software stack for both NVIDIA and AMD GPUs, these architectures are being used more and more for the acceleration of massively and embarrassingly parallel tasks, even outside of the domain of image processing.
High Performance Domain-Specific Architectures | BSC-CNS
In the High Performance Domain-Specific Architectures team, we are developing specialized architectures for multiple application domains, ranging from traditional High Performance Computing (HPC) to emerging precision medicine and security applications. Our goal is to design novel architectures with increased performance and energy efficiency, extending the life of Moore's Law as much as possible. ## Objectives The research team is mainly focused on three different research lines: [...] deep learning accelerators, Still, there is ample room for domain-specific acceleration in many application domains. [...] Moore’s Law is running out of steam. As a result, computer architecture plays a critical role in the design and efficient use of hardware resources. In this context, specialization and domain-specific architectures arise as a promising path to continuing to increase the performance and efficiency of computer architectures. Current heterogeneous architectures already offer a wide range of accelerators: GPUs and vector processors have been very successful exploiting data-level parallelism in many applications domains; accelerators for digital signal processing, encoding and decoding media, encryption, and networking are typical in the embedded domain; in the last years, there has been an explosion in the development of deep learning accelerators, Still, there is ample room for
[PDF] Lecture 26: Domain Specific Architectures
Use dedicated memories to minimize distances of data movement – Hardware-controlled multi-level cache è domain-specific software controlled scratch-pad 2. Invest resources into more arithmetic units or bigger memories – Core optimization (OoO, speculation, threading, etc) è more domain-specific FU/memory 3. Use the easiest form of parallelism that matches the domain – MIMD è SIMD or VLIW that matches domain 4. Reduce data size and type to the simplest needed for the domain – General-purpose 32/64 integer/float è domain-specific 8/16 int/float 5. [...] horsepowerdeliveredbyFermi,butitdoessoefficiently,consumingsignificantlylesspowerand generatingmuchlessheatoutput. AfullKeplerGK110implementationincludes15SMXunitsandsix64bitmemorycontrollers.Different productswillusedifferentconfigurationsofGK110.Forexample,someproductsmaydeploy13or14 SMXs. Keyfeaturesofthearchitecturethatwillbediscussedbelowinmoredepthinclude: ThenewSMXprocessorarchitecture Anenhancedmemorysubsystem,offeringadditionalcachingcapabilities,morebandwidthat eachlevelofthehierarchy,andafullyredesignedandsubstantiallyfasterDRAMI/O implementation. Hardwaresupportthroughoutthedesigntoenablenewprogrammingmodelcapabilities KeplerGK110Fullchipblockdiagram [...] a MV multiply, an element-wise MM, an element-wise MV, or a convolution from the Unified Buffer into the accumulators – Takes a variable-sized B256 input, multiplies it by a 256x256 constant input, and produces a B256 output, taking B pipelined cycles to complete § Activate – Computes activation function, those nonlinear function of the artificial neuron, with options for ReLU, Sigmoid, tanh, and so on. – Its inputs are the Accumulators, and its output is the Unified Buffer. § Write_Host_Memory – Writes data from unified buffer into host memory 27 TPU Microarchitecture – Systolic Array 28 TPU Implementation § TPU chip fabricated using the 28-nm process, 700 MHz clock. – Less than half size of an Intel Haswell CPU, which is 662 mm2. 30 Improving the TPU § First, increasing memory bandwidth
[PDF] Domain-Specific Architectures for AI and Robotics
Need flexible hardware! [...] – Y.-H. Chen, T. Krishna, J. Emer, V. Sze, “Eyeriss: An Energy-Efficient Reconfigurable Accelerator for Deep Convolutional Neural Networks,” IEEE Journal of Solid State Circuits (JSSC), ISSCC Special Issue, Vol. 52, No. 1, pp. 127-138, January 2017. – Y.-H. Chen, J. Emer, V. Sze, “Eyeriss: A Spatial Architecture for Energy-Efficient Dataflow for Convolutional Neural Networks,” International Symposium on Computer Architecture (ISCA), pp. 367-379, June 2016. – Y.-H. Chen, T.-J. Yang, J. Emer, V. Sze, “Understanding the Limitations of Existing Energy-Efficient Design Approaches for Deep Neural Networks,” SysML Conference, February 2018. [...] Select candidate scan locations Compute Shannon MI and choose best location Move to location and scan Update Occupancy Map Where to scan?