Lossy Compression (in LLMs)
A concept from computer science used to describe how LLMs work, by compressing vast amounts of training data into a model. This is relevant to the fair use debate, framing models as learning rather than copying.
entitydetail.created_at
7/26/2025, 3:35:01 AM
entitydetail.last_updated
7/26/2025, 4:05:43 AM
entitydetail.research_retrieved
7/26/2025, 3:51:22 AM
Summary
Lossy compression in Large Language Models (LLMs) is a concept that views these models as a form of data compression where some information is intentionally removed to reduce size or complexity. This is analogous to general data compression techniques, which aim to reduce the number of bits needed to represent information, either losslessly by eliminating redundancy or lossily by discarding less critical data. The effectiveness of lossy compression in LLMs involves a trade-off between the degree of compression, the amount of distortion introduced, and the computational resources required for processing. This idea was discussed in the context of the rapid advancements and competitive landscape of AI, particularly the "AI Race" between the U.S. and other nations, and the debate around AI regulation and open-source development.
Referenced in 1 Document
Research Data
Extracted Attributes
Field
Artificial Intelligence, Machine Learning, Information Theory
Purpose
Reduce the number of bits needed to represent information in LLMs
Trade-offs
Degree of compression, amount of distortion introduced, computational resources required
Characteristics
Intentionally removes some information to reduce size or complexity; discards less critical data; introduces distortion
Application in LLMs
Used to reduce computational and memory demands, often via techniques like quantization or pruning
Consequence in LLMs
Can change model behavior in unpredictable manners
Type of Compression
Data Compression
Timeline
- The paper 'Understanding The Effectiveness of Lossy Compression in Machine Learning Training Sets' was published on arXiv, discussing traditional lossy compression methods and their relevance to machine learning and LLMs. (Source: Web Search Results)
2024-03-15
- The paper 'SpQR: A Sparse-Quantized Representation for Near-Lossless LLM Weight Compression' was presented at The Twelfth International Conference on Learning Representations (ICLR 2024), detailing a specific lossy compression technique for LLMs. (Source: Web Search Results)
2024-05-07
- The paper 'End-to-End Lossless Compression for Efficient LLM Inference' was published on arXiv, discussing lossless compression for LLMs and contrasting it with lossy methods. (Source: Web Search Results)
2025-02-01
- The article 'Lossless data compression by large models' was published in Nature Machine Intelligence, further exploring compression techniques for large models. (Source: Web Search Results)
2025-05-01
Wikipedia
View on WikipediaData compression
In information theory, data compression, source coding, or bit-rate reduction is the process of encoding information using fewer bits than the original representation. Any particular compression is either lossy or lossless. Lossless compression reduces bits by identifying and eliminating statistical redundancy. No information is lost in lossless compression. Lossy compression reduces bits by removing unnecessary or less important information. Typically, a device that performs data compression is referred to as an encoder, and one that performs the reversal of the process (decompression) as a decoder. The process of reducing the size of a data file is often referred to as data compression. In the context of data transmission, it is called source coding: encoding is done at the source of the data before it is stored or transmitted. Source coding should not be confused with channel coding, for error detection and correction or line coding, the means for mapping data onto a signal. Data Compression algorithms present a space-time complexity trade-off between the bytes needed to store or transmit information, and the Computational resources needed to perform the encoding and decoding. The design of data compression schemes involves balancing the degree of compression, the amount of distortion introduced (when using lossy data compression), and the computational resources or time required to compress and decompress the data.
Web Search Results
- End-to-End Lossless Compression for Efficient LLM Inference - arXiv
As they become more capable, large language models (LLMs) have continued to rapidly increase in size. This has exacerbated the difficulty in running state of the art LLMs on small, edge devices. Standard techniques advocate solving this problem through lossy compression techniques such as quantization or pruning. However, such compression techniques are lossy, and have been shown to change model behavior in unpredictable manners. We propose Huff-LLM, an end-to-end, lossless model compression [...] Lossless compression methods (such as Huffman coding and arithmetic coding) offer a solution. Just as how a Huffman-compressed image can be reconstructed exactly in its original form; a losslessly compressed LLM model would behave identically to the original model after decompression. However, despite widespread use in other domains, lossless compression has found surprisingly little application in LLM compression. One main reason is that [...] #### Benchmarks and Evaluated LLMs. Benchmarks such as MMLU (Hendrycks et al., 2020) are typically used to measure LLM capabilities. Works that focus on lossy compression (such as quantization) often use benchmark performance to show how much information was lost in the compression process. However, since Huff-LLM is a lossless compression method, the compressed LLM maintains exactly the same accuracy as the original model by construction.
- Understanding The Effectiveness of Lossy Compression in Machine ...
First, as shown in Figure 1(a) ‣ Figure 2 ‣ V-A1 Plotting Candidate Pareto Points ‣ V-A Evaluating the Effect on Application Quality and Insights for Compression Development ‣ V Experimental Results ‣ Understanding The Effectiveness of Lossy Compression in Machine Learning Training Sets This paper has been supported funding from the National Science Foundation and the US Department of Energy")), SZ can target many levels of quality as it transitions from nearly lossless to very lossy. [...] Downloads of the Pile dataset (800GB) used to train LLMs generated nearly 320TB of traffic last month alone on HuggingFace likely costing many thousands of dollars in bandwidth costs alone. Beyond the cost, in developing countries, it would take nearly 2 days assuming no contention or failures to download the Pile once. [...] There are also specialized lossless compressors for floating point data such as fpzip #### III-B2 Traditional Lossy Compression Methods Lossy compression allows for data distortion in the reconstructed data compared with the original input and closely approximates the original data because many practical use cases do not need a “complete facsimile” . Many of these methods, however, are not designed for floating point numeric data compression but rather for images with integral pixels.
- Efficient Compressing and Tuning Methods for Large Language ...
Tim Dettmers et al. 2024. SpQR: A Sparse-Quantized Representation for Near-Lossless LLM Weight Compression. In The Twelfth International Conference on Learning Representations, ICLR 2024, Vienna, Austria, May 7-11, 2024. Google Scholar \[28\] [...] Efficient compression and tuning techniques have become indispensable in addressing the increasing computational and memory demands of large language models (LLMs). While these models have demonstrated exceptional performance across a wide range of natural language processing tasks, their growing size and resource requirements pose significant challenges to accessibility and sustainability. This survey systematically reviews state-of-the-art methods in model compression, including compression [...] techniques such as knowledge distillation, low-rank approximation, parameter pruning, and quantization, as well as tuning techniques such as parameter-efficient fine-tuning and inference optimization. Compression techniques, though well-established in traditional deep learning, require updated methodologies tailored to the scale and dynamics of LLMs. Simultaneously, parameter-efficient fine-tuning, exemplified by techniques like Low-Rank Adaptation (LoRA) and query tuning, emerges as a
- Lossless Compression of Large Language Model-Generated Text ...
| | | | --- | --- | | Subjects: | Machine Learning (cs.LG); Computation and Language (cs.CL) | | Cite as: | arXiv:2505.06297 [cs.LG] | | | (or arXiv:2505.06297v1 [cs.LG] for this version) | | | Focus to learn more arXiv-issued DOI via DataCite | ## Submission history ## Access Paper: license icon ### References & Citations ## BibTeX formatted citation ### Bookmark BibSonomy logo Reddit logo # Bibliographic and Citation Tools # Code, Data and Media Associated with this Article [...] close this message arXiv smileybones ## arXiv Is Hiring a DevOps Engineer Work on one of the world's most important websites and make an impact on open science. Cornell University arXiv Is Hiring a DevOps Engineer arxiv logo Help | Advanced Search arXiv logo Cornell University Logo ## quick links # Computer Science > Machine Learning # Title:Lossless Compression of Large Language Model-Generated Text via Next-Token Prediction [...] # Demos # Recommenders and Search Tools # arXivLabs: experimental projects with community collaborators arXivLabs is a framework that allows collaborators to develop and share new arXiv features directly on our website. Both individuals and organizations that work with arXivLabs have embraced and accepted our values of openness, community, excellence, and user data privacy. arXiv is committed to these values and only works with partners that adhere to them.
- Lossless data compression by large models - Nature
Check for updates. Verify currency and authenticity via CrossMark ### Cite this article Li, Z., Huang, C., Wang, X. et al. Lossless data compression by large models. Nat Mach Intell 7, 794–799 (2025). Download citation Received: 13 July 2024 Accepted: 04 April 2025 Published: 01 May 2025 Issue Date: May 2025 DOI: ### Share this article Anyone you share the following link with will be able to read this content: Sorry, a shareable link is not currently available for this article. [...] Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript. Advertisement Advertisement Nature Machine Intelligence # Lossless data compression by large models [...] Nature Machine Intelligence volume 7, pages 794–799 (2025)Cite this article 3695 Accesses 1 Citations 23 Altmetric Metrics details ### Subjects This article has been updated A preprint version of the article is available at arXiv. ## Abstract