Scaling Laws (AI)
The principle that AI model performance improves predictably as compute power, data, and model size are scaled up. Musk suggests this follows a natural logarithmic function.
First Mentioned
9/10/2025, 2:20:07 AM
Last Updated
9/10/2025, 2:25:20 AM
Research Retrieved
9/10/2025, 2:25:20 AM
Summary
Scaling laws in Artificial Intelligence (AI) describe the predictable, logarithmic improvements in AI performance as computational resources, such as data and compute, are increased. Elon Musk noted that progress in AI generally follows these scaling laws. This concept is relevant to the development of advanced AI systems like Tesla's Full Self-Driving (FSD) and xAI's Grok, where significant leaps in performance are anticipated with the deployment of new hardware like Tesla's AI5 chip and supercomputers like Colossus.
Referenced in 1 Document
Research Data
Extracted Attributes
Field
Artificial Intelligence (AI)
Types
Includes training scaling laws (relating model size, training data, compute to performance) and test-time/inference scaling laws (techniques applied after training to enhance performance during inference).
Mechanism
Performance of a deep learning model scales according to the number of parameters and tokens, shaped by the amount of compute used. It's a power-law relationship where loss tends to decrease substantially, but not linearly, with increases in data and model size.
Description
Describes how loss changes with model and dataset size, showing predictable, logarithmic improvements in AI performance as computational resources (data, compute) are increased.
Practical Use
Allows researchers to train smaller models, fit scaling laws, and use them to extrapolate the performance of much larger models.
Key Observation
The performance of an AI model improves as a function of increasing scale in model size, dataset size, and compute power.
Compute Growth Rate
Compute resources for AI training have been growing at approximately 4x per year due to consistent improvements from scaling laws.
Timeline
- OpenAI researchers, led by Jared Kaplan, published 'Scaling Laws for Neural Language Models,' observing precise power-law scalings for performance as a function of training time, context length, dataset size, model size, and compute budget. (Source: web search)
2020-01
- Compute resources for AI training have been growing at approximately 4x per year, driven by the consistent and predictable improvements from scaling laws. (Source: web search)
Ongoing
- Widespread questioning of model scaling began as the pace of model releases and improvements slowed, raising questions about whether AI research and scaling laws were hitting a wall. (Source: web search)
2024 (latter half)
Web Search Results
- 2.4: Scaling Laws | AI Safety, Ethics, and Society Textbook
In AI, scaling laws describe how loss changes with model and dataset size. We observed that the performance of a deep learning model scales according to the number of parameters and tokens—both shaped by the amount of compute used. Evidence from generative models like LLMs indicates a smooth reduction in loss with increases in model size and training data, adhering to a clear scaling law. Scaling laws are especially important for understanding how changes in variables like the [...] Power laws in the context of deep learning are called scaling laws. Scaling laws , predict loss given model size and dataset size in a power law relationship. Model size is usually measured in parameters, while dataset size is measured in tokens. As both variables increase, the model’s loss tends to decrease. This decrease in loss with scale often follows a power law: the loss drops substantially, but not linearly, with increases in data and model size. For instance, [...] Scaling laws are a type of power law. Power laws are mathematical equations that model how a particular quantity varies as the power of another. In power laws, the variation in one quantity is proportional to a power (exponent) of the variation in another. The power law y = bxa states that the change in y is directly proportional to the change in x raised to a certain power a. If a is 2, then when x is doubled, y will quadruple. One real-world
- Scaling Laws for LLMs: From GPT-3 to o3 - Deep (Learning) Focus
This is where scaling laws come in. So far, we have seen some empirical analysis that was conducted to prove that scaling laws exist, but these scaling laws also have a very practical use case within AI research. In particular, we can: Train a bunch of smaller models using various training settings. Fit scaling laws based on the performance of smaller models. Use the scaling law to extrapolate the performance of a much larger model. [...] Scaling laws have recently become a popular (and contentious) topic within AI research. As we have seen throughout this overview, scaling has fueled most improvements in AI throughout the age of pretraining. As the pace of model releases and improvements slowed in the latter half of 202411, however, we began to see widespread questioning of model scaling, seeming to indicate that AI research—and scaling laws in particular—could be hitting a wall. [...] What do scaling laws tell us? First, we need to recall the technical definition of a scaling law. Scaling laws define a relationship, based upon a power law, between training compute (or model / dataset size) and the test loss of an LLM. However, the nature of this relationship is often misunderstood. The idea of getting exponential performance improvements from logarithmic increases in compute is a myth. Scaling laws look more like an exponential decay, meaning that we will have to work harder
- The three AI scaling laws and what they mean for AI infrastructure
The second (or fifth) AI scaling law is test-time scaling which refers to techniques applied after training and during inference meant to enhance performance and drive efficiency without retraining the model. Some of the core concepts here are: [...] In January 2020, a team of OpenAI researchers led by Jared Kaplan, who moved on to co-found Anthropic, published a paper titled “Scaling Laws for Neural Language Models.” The researchers observed “precise power-law scalings for performance as a function of training time, context length, dataset size, model size and compute budget.” Essentially, the performance of an AI model improves as a function of increasing scale in model size, dataset size and compute power. While the commercial trajectory [...] The evolution of AI scaling laws—from the foundational trio identified by OpenAI to the more nuanced concepts of post-training and test-time scaling championed by NVIDIA—underscores the complexity and dynamism of modern AI. These laws not only guide researchers and practitioners in building better models but also drive the design of the AI infrastructure needed to sustain AI’s growth.
- Scaling: The State of Play in AI - by Ethan Mollick - One Useful Thing
The existence of two scaling laws - one for training and another for "thinking" - suggests that AI capabilities are poised for dramatic improvements in the coming years. Even if we hit a ceiling on training larger models (which seems unlikely for at least the next couple of generations), AI can still tackle increasingly complex problems by allocating more computing power to "thinking." This dual-pronged approach to scaling virtually guarantees that the race for more powerful AI will continue [...] To understand where we are with LLMs you need to understand scale. As I warned, I am going to oversimplify things quite a bit, but there is an “scaling law” (really more of an observation) in AI that suggests the larger your model, the more capable it is. Larger models mean they have a greater number of parameters, which are the adjustable values the model uses to make predictions about what to write next. These models are typically trained on larger amounts of data, measured in tokens, which
- Language Model Scaling Laws: Beyond Bigger AI Models in 2024
At the heart of this evolution are scaling laws, which describe the relationships between a model’s performance and its key attributes, i.e. size (number of parameters), training data volume, and computational resources. [...] The o1 model introduced new scaling laws that apply to inference rather than training. These laws suggest that allocating additional computing resources at inference time can lead to more accurate results, challenging the previous paradigm of optimising for fast inference. OpenAI’s report shows similar empirical scaling laws for inference. It demonstrates that o1’s performance constantly improves with more computing time spent for inference. [...] Initially, these laws focused primarily on the relationship between model size, training data volume, and performance, as shown in the work of OpenAI’s Kaplan et al. in 2020. This led to a “bigger is better” approach, with researchers and companies racing to create ever-larger models. The consistent and predictable improvements from scaling have led to an aggressive expansion in the scale of AI training, with compute resources growing at a staggering rate of approximately 4x per year.