The Bitter Lesson
A principle in AI research, articulated by Rich Sutton, which holds that general-purpose methods leveraging massive computational power will ultimately outperform more specialized, human-curated approaches.
entitydetail.created_at
7/12/2025, 4:40:58 AM
entitydetail.last_updated
7/26/2025, 6:41:59 AM
entitydetail.research_retrieved
7/12/2025, 5:01:09 AM
Summary
The Bitter Lesson is a foundational concept in artificial intelligence, articulated by Canadian computer scientist Richard Sutton in 2019. It posits that in AI development, scalable computational power and general-purpose learning methods consistently outperform systems that rely on human-crafted domain-specific knowledge or extensive human-labeled data. This principle is exemplified by recent advancements like xAI's Grok 4, trained on the Colossus supercomputer, and Tesla's Full Self-Driving, as well as the increasing use of synthetic data for training large language models, underscoring the diminishing long-term value of human-labeled data.
Referenced in 2 Documents
Research Data
Extracted Attributes
Field
Artificial Intelligence
Author
Richard S. Sutton
Concept Type
Scientific Concept
Core Principle
Scalable computation and general methods consistently outperform human-crafted domain-specific knowledge or human-labeled data in AI.
Key Observation
AI researchers often try to build knowledge into agents, which helps in the short term, but is ultimately surpassed by methods leveraging brute-force computation.
Underlying Enabler
Exponentially falling cost per unit of computation (e.g., Moore's Law).
Applications/Examples
Chess, Go, Speech Recognition, Computer Vision, Autonomous Driving, Large Language Models.
Timeline
- Richard Sutton publishes his influential essay 'The Bitter Lesson'. (Source: Web Search)
2019
- The principle of 'The Bitter Lesson' is observed in various AI advancements, including the development of xAI's Grok 4 and Tesla FSD, and the increasing use of synthetic data for LLM training. (Source: Summary, Related Documents)
Ongoing
Wikipedia
View on WikipediaRichard S. Sutton
Richard Stuart Sutton (born 1957 or 1958) is a Canadian computer scientist. He is a professor of computing science at the University of Alberta, fellow & Chief Scientific Advisor at the Alberta Machine Intelligence Institute, and a research scientist at Keen Technologies. Sutton is considered one of the founders of modern computational reinforcement learning, having several significant contributions to the field, including temporal difference learning and policy gradient methods.
Web Search Results
- Rich Sutton's bitter lesson of AI - Applied Mathematics Consulting
The “Bitter Lesson” is the principle that, in the field of artificial intelligence, the most effective and lasting progress has historically come from methods that rely on leveraging ever-increasing computational power rather than depending on human ingenuity to craft domain-specific knowledge or shortcuts. … and concludes with the following pithy summary of its summary: [...] In essence, the Bitter Lesson is: computation trumps human-crafted specialization in AI, and embracing this, though humbling, is key to future success. Sutton supports his thesis with examples from chess, go, speech recognition, and computer vision. [...] The biggest lesson that can be read from 70 years of AI research is that general methods that leverage computation are ultimately the most effective, and by a large margin. Clear enough, except you have to read further to get the “compared to what.” I asked Grok 3 to read the article and tell me exactly what the bitter lesson is. It begins with this explanation:
- Reflections on 'The Bitter Lesson' - Cognitive Medium
I think it’s a mistake to expect to reason about this from first principles and arrive at reliable conclusions. The Bitter Lesson is a heuristic model and set of arguments to keep in mind, not a reliable argument that applies in all circumstances. You need to proceed empirically. Keep the Bitter Lesson in mind, yes, but also keep in mind that your OS wasn’t produced by training TPUs for a decade. [...] Rich Sutton is an expert on neural networks at the University of Alberta and DeepMind. He’s written a stimulating essay describing what he calls “the bitter lesson”: in AI research it’s extremely seductive to try to build expert domain knowledge into the systems you’re creating, but, according to Sutton, this approach gets beaten again and again by methods leveraging brute force computation, notably search and learning. Here’s Sutton’s basic description, emphases mine: [...] are continuing to make the same kind of mistakes. To see this, and to effectively resist it, we have to understand the appeal of these mistakes. We have to learn the bitter lesson that building in how we think we think does not work in the long run. The bitter lesson is based on the historical observations that 1) AI researchers have often tried to build knowledge into their agents, 2) this always helps in the short term, and is personally satisfying to the
- Learning the Bitter Lesson: Empirical Evidence from 20 Years ... - arXiv
## 1 Introduction Rich Sutton’s influential essay "The Bitter Lesson" argues that the most significant advancements in artificial intelligence (AI) have come from focusing on general methods that leverage computation rather than human-designed representations and knowledge. This principle has been particularly evident in the field of Computer Vision (CV), which has witnessed a notable shift from hand-crafted features to deep learning models. [...] The field of artificial intelligence (AI) has witnessed a paradigm shift, eloquently articulated in Rich Sutton’s influential essay "The Bitter Lesson" (Sutton, 2019). Sutton’s thesis emphasizes the primacy of general methods that harness computational power over human-designed representations and domain-specific knowledge. This perspective echoes the seminal work of Leo Breiman, who, two decades earlier, delineated the dichotomy between statistical and algorithmic approaches in his paper [...] We want to evalute this abstract in terms of alignment with "The Bitter Lesson". The main idea of Rich Sutton’s "The Bitter Lesson" is that the most effective AI approaches in the long run are those that leverage computation and general-purpose methods like search and learning, rather than human-designed systems that try to build in human knowledge. Evaluate the alignment of the abstract with the following principles, assigning a score from 0 to 10 for each. """ )
- The "Bitter Lesson" post from Rich Sutton from earlier this year [1 ...
| | | | | | --- | --- | --- | | | | m3at on Nov 11, 2019 | parent | context | favorite | on: The post-exponential era of AI and Moore’s Law The "Bitter Lesson" post from Rich Sutton from earlier this year seems a very good complement to this article: he explain how all of the big improvements in the field came from new methods that leveraged the much larger compute available from Moore's law, instead of progressive buildup over existing methods. A great quote from McCarthy also [...] | | | m3at on Nov 11, 2019 | parent | context | favorite | on: The post-exponential era of AI and Moore’s Law The "Bitter Lesson" post from Rich Sutton from earlier this year seems a very good complement to this article: he explain how all of the big improvements in the field came from new methods that leveraged the much larger compute available from Moore's law, instead of progressive buildup over existing methods. A great quote from McCarthy also regularly referenced by Sutton is [...] | | | KKKKkkkk1 on Nov 11, 2019 | prev | next (javascript:void(0)) Sutton argues that algorithm development needs to be shaped by the assumption that compute power will continue growing exponentially into the future. At this point, it is commonly believed that this is not the case. One thing that should be learned from the bitter lesson is the great power of general purpose methods, of methods that continue to scale with increased computation even as the available computation becomes very
- The Bitter Lesson (2019) - Hacker News
| | | | | | --- | --- | --- | | | | coldtea on Dec 15, 2022 | prev | next (javascript:void(0)) >The biggest lesson that can be read from 70 years of AI research is that general methods that leverage computation are ultimately the most effective, and by a large margin. The ultimate reason for this is Moore's law, or rather its generalization of continued exponentially falling cost per unit of computation. That doesn't sound right. If there's an "ultimate reason" for computation AI [...] | | | | | | --- | --- | --- | | | | g42gregory on Dec 15, 2022 | prev | next (javascript:void(0)) I think this bitter lesson needs to be taken for a several grains of salt. Number one, the progress in a particular AI field tends to go, at first, from custom to more general algorithms, exactly as Professor Richard Sutton described. However, there is a second part to this progress, where, once we "understood" (which we never really do) the new level of general algorithms (say Transformers [...] | | | coldtea on Dec 15, 2022 | prev | next (javascript:void(0)) >The biggest lesson that can be read from 70 years of AI research is that general methods that leverage computation are ultimately the most effective, and by a large margin. The ultimate reason for this is Moore's law, or rather its generalization of continued exponentially falling cost per unit of computation. That doesn't sound right. If there's an "ultimate reason" for computation AI success, is that the problem, of