Positional Encoding
A core technique in transformer-based AI models that allows them to understand the relationships and order of words. This technical process is described as being 'transformational' and is central to the legal argument for fair use in AI training.
entitydetail.created_at
7/13/2025, 5:56:24 PM
entitydetail.last_updated
7/22/2025, 4:45:35 AM
entitydetail.research_retrieved
7/13/2025, 6:09:50 PM
Summary
Positional encoding is a fundamental technique in deep learning, especially within transformer architectures, designed to inject information about the relative or absolute position of tokens in a sequence. This is critical because transformers process tokens in parallel, unlike recurrent neural networks, and thus lack an inherent understanding of sequential order. By adding a positional vector to each token's embedding, positional encoding allows the model to retain crucial word order information, enabling parallelization and efficient training. The concept's ability to distinguish input from output also makes it relevant in discussions around AI copyright, as highlighted by a ruling that affirmed fair use for training AI on legally acquired data, a distinction seen as vital for global AI competition.
Referenced in 1 Document
Research Data
Extracted Attributes
Field
Deep Learning
Types
Absolute Positional Encoding, Relative Positional Encoding
Benefit
Enables parallelization and efficient model training while retaining word order information
Purpose
To provide information about the relative or absolute position of tokens in a sequence
Mechanism
Adds a unique positional vector to the input embedding of each token
Necessity
Transformers process tokens in parallel and do not inherently maintain sequential order
Primary Application
Transformer architectures
Relevance to AI Copyright
Technically supports distinguishing input from output, relevant to fair use rulings for AI training data
Timeline
- The modern version of the Transformer architecture, which incorporates positional encoding as a key component, was proposed in the paper 'Attention Is All You Need' by researchers at Google. (Source: Wikipedia)
2017-06-12
Wikipedia
View on WikipediaTransformer (deep learning architecture)
In deep learning, transformer is an architecture based on the multi-head attention mechanism, in which text is converted to numerical representations called tokens, and each token is converted into a vector via lookup from a word embedding table. At each layer, each token is then contextualized within the scope of the context window with other (unmasked) tokens via a parallel multi-head attention mechanism, allowing the signal for key tokens to be amplified and less important tokens to be diminished. Transformers have the advantage of having no recurrent units, therefore requiring less training time than earlier recurrent neural architectures (RNNs) such as long short-term memory (LSTM). Later variations have been widely adopted for training large language models (LLMs) on large (language) datasets. The modern version of the transformer was proposed in the 2017 paper "Attention Is All You Need" by researchers at Google. Transformers were first developed as an improvement over previous architectures for machine translation, but have found many applications since. They are used in large-scale natural language processing, computer vision (vision transformers), reinforcement learning, audio, multimodal learning, robotics, and even playing chess. It has also led to the development of pre-trained systems, such as generative pre-trained transformers (GPTs) and BERT (bidirectional encoder representations from transformers).
Web Search Results
- What is Positional Encoding? - IBM
Positional encoding is a technique that injects information about the position of the words in a sequence to transformer architectures. The order of words plays a fundamental part in understanding the semantic meaning of a sentence. For example, “Allen walks dog” and “dog walks Allen” have entirely different meanings despite having the same words, or tokens. When implementing natural language processing (NLP) applications by using deep learning and neural networks, we need to create a mechanism [...] transformers (Vaswani et al. 2017) do not retain word order and process tokens in parallel. Therefore, we need to implement a mechanism that can explicitly represent the order of words in a sequence—a technique known as positional encoding. Positional encoding allows the transformer to retain information of the word order, enabling parallelization and efficient model training. You can often find implementations of positional encoding on GitHub. [...] We achieve this objective by first processing each word as a vector that represents its meaning—for example, “dog” will be encoded in a high dimensional array that encodes its concept. In technical terms, each word or sub word is mapped to an input embedding of varying lengths. However, on its own, the meaning vector does not tell us where in the sentence dog appears. Positional encoding adds a second vector—one that encodes the position index, such as “first word”, or “second word”, and so on.
- A Gentle Introduction to Positional Encoding in Transformer Models ...
information regarding the order of words in a sentence. Positional encoding is the scheme through which the knowledge of the order of objects in a sequence is maintained. [...] Positional encoding describes the location or position of an entity in a sequence so that each position is assigned a unique representation. There are many reasons why a single number, such as the index value, is not used to represent an item’s position in transformer models. For long sequences, the indices can grow large in magnitude. If you normalize the index value to lie between 0 and 1, it can create problems for variable length sequences as they would be normalized differently. [...] 2. Positional Encoding: Since the transformer architecture does not inherently process sequential data in order (unlike RNNs or LSTMs), it requires a method to understand the order of words in a sentence. Positional encoding is added to solve this problem. It involves generating another set of vectors that encode the position of each word in the sentence. Each positional vector is unique to its position, ensuring that the model can recognize the order of words. The positional encoding is
- A Guide to Understanding Positional Encoding for Deep Learning ...
Table of Contents Let’s begin! # The Big Picture 🏞 # What is Positional Encoding? Positional encoding, simply put, provides information about the position of elements in a sequence, in a way the model can understand. As humans, we can look at a sequence of events, and intuitively understand that one step, comes after the next, after the next… there is an order of things. [...] Positional Encoding is a key component of transformer models and other sequence-to-sequence models. The term itself can sound intimidating, and understanding how it fits into the grand scheme of model development can be a daunting task. But fear not, for this guide is designed to unravel the mysteries of positional encoding in a simple and comprehensible manner. [...] In many tasks such as language translation, treating each word identically is problematic. For a sentence to make sense, the position of each word is crucial. With positional encoding, we insert positional information into our input sequence (in this example, the sentence we’d like to translate), and the model can differentiate the elements based on where they are located in the sequence. # Where is it used?
- Positional Encoding in Transformer-Based Time Series Models - arXiv
Transformers lack an inherent notion of sequence order, as self-attention operates on sets of inputs without considering their positions. To address this, positional encoding is introduced to incorporate positional information into the model, allowing it to understand the order of elements in a sequence. [...] Figure 2: Comparison of Absolute and Relative Positional Encoding in Self-Attention. Absolute positional encoding is typically applied by adding fixed positional vectors to the input embeddings and modifying the queries and keys. In contrast, relative positional encoding directly incorporates relative positional information into the attention score calculation. The function A 𝐴 A italic_A represents the attention mechanism, where absolute modifies the input embeddings, while relative positional [...] Formulation and Analysis of Positional Encoding Methods ------------------------------------------------------- This section offers a fundamental explanation of self-attention and a summary of existing positional encoding models. It’s important to distinguish between positional encoding, which refers to the approach used to incorporate positional information (e.g., absolute or relative), and positional embedding, which denotes the numerical vector associated with that encoding.
- 11.6. Self-Attention and Positional Encoding - Dive into Deep Learning
Besides capturing absolute positional information, the above positional encoding also allows a model to easily learn to attend by relative positions. This is because for any fixed position offset \(\delta\), the positional encoding at position \(i + \delta\) can be represented by a linear projection of that at position \(i\).