Large language models need to handle increasingly longer sequences, but current methods are either too slow or too limited, making it difficult to go beyond a certain length. To help overcome this hurdle, Microsoft presents LongNet, a new type of Transformer that can handle sequences of over 1 billion tokens, without losing performance on shorter ones.
Company owns published papers who proposed a new AI transformer model variant called Microsoft LongNet. Transformers models is a type of neural network architecture that can process sequential data, such as natural language or speech. Large language models (LLMs) such as GPT-4 from OpenAi, LLaMA from Meta or PaLM 2 from Google are based on transformer models that have been trained on extensive text data.
Microsoft LongNet’s main innovation is its extended concern, which includes more tokens as distance increases, reducing computation complexity and dependencies between tokens. This paper demonstrates that LongNet performs well on long sequence modeling and common language tasks, and can be easily integrated with existing Transformer-based optimizations. This paper also discusses the potential application of LongNet for modeling very long sequences, such as using an entire corpus or even the entire Internet as a sequence.
AI expert and documentarian David Shapiro post YouTube videos discusses why Microsoft Longnet was such a big breakthrough. As an overview of what it means to see a billion tokens, Shapiro offers an example of a 3GB image. He explained how humans are able to see the whole picture and understand it, but also to see the small details in the picture and also understand it.
AI is not always good at seeing small details. Often can see the bigger picture but can get lost in the smaller aspects of what he sees. A good example is the current model of large languages that support chatbots like Google Bard or Bing Chat. These sophisticated AI tools can often display information covering a topic, but can often provide incorrect information when it goes into the smaller details.
What is Tokenization and Why it Matters to AI
Natural Language Processing (NLP) is a field of AI that deals with understanding and generating human language. But before we can enter text into a computer, we need to break it down into smaller parts that the computer can handle. This process is called tokenizationand is one of the most basic and important steps in NLP.
Tokenization is like cutting a cake into slices: You take large, complex text and divide it into smaller, simpler units, such as words, sentences, or characters. For example, the sentence “I like NLP” can be defined as three words: “I”, “love”, and “NLP”.
But tokenization is not just a simple truncation operation. It’s also an art and a science, as different languages and tasks require different ways of marking up text. For example, some languages, such as Chinese or Japanese, don’t have spaces between words, so we need to use a special algorithm to find word boundaries. Some tasks, such as sentiment analysis or text summarization, can utilize punctuation or emoticons as tokens, as they convey important information.
Tokenization is also a key component of both traditional and modern NLP methods. In traditional methods, such as the Count Vectorizer, we use tokenization to create a numeric text representation based on the frequency of each token. In modern methods, such as Transformers, we use tokenization to create a sequence of tokens that can be processed by a neural network.
Therefore, tokenization is an important step in NLP, because it determines how text is represented and understood by computers. This is also an interesting topic, because it reveals the diversity and complexity of natural languages.
What Expanding the Number of Tokens Means for AI Development
By expanding the number of tokens, the AI model can basically show all the bigger picture while also being able to focus on the smaller details. Microsoft LongNet’s idea is to use extended attention which expands the number of tokens it uses as distance increases.
LongNet has several benefits:
Has fast computing speed and little dependency between tokens; It can be used as a distributed trainer for very long sequences; His widened attention can easily be added to existing Transformer-based optimizations.
This means LongNet can model long sequences as well as common language tasks. David Shapiro explained that Microsoft’s paper signals a push towards artificial general intelligence (AGI). He points out that the ability to own more tokens means being able to accurately cover large tasks instantly. Shapiro offers medical research as an example, where thousands of journals can be read by AI.
That is, by the way, the ability to read the entire internet at once and in seconds. It should also be noted that LongNet is only the beginning. As the concept gets stronger, Shapiro says it will eventually be able to see trillions of tokens and even the entire internet one day. Once that happens, growth will outpace human capabilities and the AI can move towards AGI.
LongNet is in the research phase and predicts Shapiro probably won’t see what he’s capable of for at least a year. Even so, with the rapid development of AI, it seems that very strong intelligence could be closer than many people think. Some forecasts have put the development of superintelligence at least 20 years away, while some believe we will never reach it.
Thus the article about Microsoft LongNet Unveils Billions of Transformers AI Tokens, Demonstrating the Potential of Super Intelligence
I hope the information in the article is useful to you. Thank you for taking the time to visit this blog. If there are suggestions and criticisms, please contact us : firstname.lastname@example.org