Understanding Long Context Windows: Key Insights
Yesterday, we unveiled our latest breakthrough in AI technology with the Gemini 1.5 model. This new iteration brings significant enhancements in speed and efficiency, but the real game-changer is its innovative long context window. This feature allows the model to process an unprecedented number of tokens — the fundamental units that make up words, images, or videos — all at once. To shed light on this advancement, we turned to the Google DeepMind project team for insights into what long context windows are and how they can revolutionize the way developers work.
Understanding long context windows is crucial because they enable AI models to maintain and recall information throughout a session. Imagine trying to remember a name just minutes after it's mentioned in a conversation, or rushing to write down a phone number before it slips your mind. AI models face similar challenges, often "forgetting" details after a few interactions. Long context windows address this issue by allowing the model to keep more information in its "memory."
Previously, the Gemini model could handle up to 32,000 tokens simultaneously. However, with the release of 1.5 Pro for early testing, we've pushed the boundaries to a staggering 1 million tokens — the largest context window of any large-scale foundation model to date. Our research has even gone beyond this, successfully testing up to 10 million tokens. The larger the context window, the more diverse and extensive the data — text, images, audio, code, or video — the model can process.
Nikolay Savinov, a Google DeepMind Research Scientist and one of the leads on the long context project, shared, "Our initial goal was to reach 128,000 tokens, but I thought aiming higher would be beneficial, so I proposed 1 million tokens. And now, our research has exceeded that by 10 times."
Achieving this leap required a series of deep learning innovations. Pranav Shyam's early explorations provided crucial insights that guided our research. Denis Teplyashin, a Google DeepMind Engineer, explained, "Each breakthrough led to another, opening up new possibilities. When these innovations combined, we were amazed at the results, scaling from 128,000 tokens to 512,000, then 1 million, and recently, 10 million tokens in our internal research."
The expanded capacity of 1.5 Pro opens up exciting new applications. For instance, instead of summarizing a document that's dozens of pages long, it can now handle documents thousands of pages in length. Where the previous model could analyze thousands of lines of code, 1.5 Pro can now process tens of thousands of lines at once.
Machel Reid, another Google DeepMind Research Scientist, shared some fascinating test results: "In one test, we fed the entire codebase into the model, and it generated comprehensive documentation for it, which was incredible. In another, it accurately answered questions about the 1924 film Sherlock Jr. after 'watching' the entire 45-minute movie."
1.5 Pro also excels at reasoning across data within a prompt. Machel highlighted an example involving the rare language Kalamang, spoken by fewer than 200 people worldwide. "The model can't translate into Kalamang on its own, but with the long context window, we could include the entire grammar manual and example sentences. The model then learned to translate from English to Kalamang at a level comparable to someone learning from the same material."
Gemini 1.5 Pro comes with a standard 128K-token context window, but a select group of developers and enterprise customers can access a 1 million token context window through AI Studio and Vertex AI in private preview. Managing such a large context window is computationally intensive, and we're actively working on optimizations to reduce latency as we scale it out.
Looking ahead, the team is focused on making the model faster and more efficient, with safety as a priority. They're also exploring ways to further expand the long context window, enhance underlying architectures, and leverage new hardware improvements. Nikolay noted, "10 million tokens at once is nearing the thermal limit of our Tensor Processing Units. We're not sure where the limit lies yet, and the model might be capable of even more as hardware continues to evolve."
The team is eager to see the innovative applications that developers and the broader community will create with these new capabilities. Machel reflected, "When I first saw we had a million tokens in context, I wondered, 'What do you even use this for?' But now, I believe people's imaginations will expand, leading to more creative uses of these new capabilities."
[ttpp][yyxx]

Related article
Kakao Mobility outlines Level 4 autonomous driving roadmap for physical AI
Kakao Mobility is planning to develop Level 4 autonomous driving technologies internally as part of its physical AI strategy.
At the 2026 World IT Show conference in Seoul's COEX, Kim Jin-kyu — vice president and head of Kakao Mobility's Physical AI
Barry Diller: Trust in Sam Altman irrelevant as AGI nears
Barry Diller, the billionaire media titan, does not believe OpenAI CEO Sam Altman is untrustworthy, despite recent reports suggesting otherwise. Speaking at the Wall Street Journal's "Future of Everything" conference this week, Diller defended Altman
YouTube expands AI deepfake detection to politicians, government officials, and journalists
On Tuesday, YouTube announced it is expanding its deepfake detection technology to a select group of government officials, political candidates, and journalists. The tool identifies AI-generated likenesses and lets pilot participants request the remo
Related Special Topic Recommendations
Comments (30)
0/500
すごい!長文コンテキストの機能が実用化されたら、研究やビジネス文書の分析が一気に楽になりそう🤩。でもこれ、倫理面でどうなんだろう?膨大なデータを読み込むということは、プライバシー問題も発生しそうで少し不安…。他社は今後どう追従するのか気になるなぁ。開発スピード速すぎて置いていかれそう!
Super cool to see Gemini 1.5's long context window in action! 😎 Makes me wonder how it'll handle massive datasets compared to older models.
Wow, the long context window in Gemini 1.5 sounds like a game-changer! I'm curious how it'll handle massive datasets in real-world apps. Excited to see where this takes AI! 🚀
The long context window in Gemini 1.5 sounds like a game-changer! I'm curious how it'll handle massive datasets in real-world apps. Any cool examples out there yet? 🤔
Yesterday, we unveiled our latest breakthrough in AI technology with the Gemini 1.5 model. This new iteration brings significant enhancements in speed and efficiency, but the real game-changer is its innovative long context window. This feature allows the model to process an unprecedented number of tokens — the fundamental units that make up words, images, or videos — all at once. To shed light on this advancement, we turned to the Google DeepMind project team for insights into what long context windows are and how they can revolutionize the way developers work.
Understanding long context windows is crucial because they enable AI models to maintain and recall information throughout a session. Imagine trying to remember a name just minutes after it's mentioned in a conversation, or rushing to write down a phone number before it slips your mind. AI models face similar challenges, often "forgetting" details after a few interactions. Long context windows address this issue by allowing the model to keep more information in its "memory."
Previously, the Gemini model could handle up to 32,000 tokens simultaneously. However, with the release of 1.5 Pro for early testing, we've pushed the boundaries to a staggering 1 million tokens — the largest context window of any large-scale foundation model to date. Our research has even gone beyond this, successfully testing up to 10 million tokens. The larger the context window, the more diverse and extensive the data — text, images, audio, code, or video — the model can process.
Nikolay Savinov, a Google DeepMind Research Scientist and one of the leads on the long context project, shared, "Our initial goal was to reach 128,000 tokens, but I thought aiming higher would be beneficial, so I proposed 1 million tokens. And now, our research has exceeded that by 10 times."
Achieving this leap required a series of deep learning innovations. Pranav Shyam's early explorations provided crucial insights that guided our research. Denis Teplyashin, a Google DeepMind Engineer, explained, "Each breakthrough led to another, opening up new possibilities. When these innovations combined, we were amazed at the results, scaling from 128,000 tokens to 512,000, then 1 million, and recently, 10 million tokens in our internal research."
The expanded capacity of 1.5 Pro opens up exciting new applications. For instance, instead of summarizing a document that's dozens of pages long, it can now handle documents thousands of pages in length. Where the previous model could analyze thousands of lines of code, 1.5 Pro can now process tens of thousands of lines at once.
Machel Reid, another Google DeepMind Research Scientist, shared some fascinating test results: "In one test, we fed the entire codebase into the model, and it generated comprehensive documentation for it, which was incredible. In another, it accurately answered questions about the 1924 film Sherlock Jr. after 'watching' the entire 45-minute movie."
1.5 Pro also excels at reasoning across data within a prompt. Machel highlighted an example involving the rare language Kalamang, spoken by fewer than 200 people worldwide. "The model can't translate into Kalamang on its own, but with the long context window, we could include the entire grammar manual and example sentences. The model then learned to translate from English to Kalamang at a level comparable to someone learning from the same material."
Gemini 1.5 Pro comes with a standard 128K-token context window, but a select group of developers and enterprise customers can access a 1 million token context window through AI Studio and Vertex AI in private preview. Managing such a large context window is computationally intensive, and we're actively working on optimizations to reduce latency as we scale it out.
Looking ahead, the team is focused on making the model faster and more efficient, with safety as a priority. They're also exploring ways to further expand the long context window, enhance underlying architectures, and leverage new hardware improvements. Nikolay noted, "10 million tokens at once is nearing the thermal limit of our Tensor Processing Units. We're not sure where the limit lies yet, and the model might be capable of even more as hardware continues to evolve."
The team is eager to see the innovative applications that developers and the broader community will create with these new capabilities. Machel reflected, "When I first saw we had a million tokens in context, I wondered, 'What do you even use this for?' But now, I believe people's imaginations will expand, leading to more creative uses of these new capabilities."
[ttpp][yyxx]

Barry Diller: Trust in Sam Altman irrelevant as AGI nears
Barry Diller, the billionaire media titan, does not believe OpenAI CEO Sam Altman is untrustworthy, despite recent reports suggesting otherwise. Speaking at the Wall Street Journal's "Future of Everything" conference this week, Diller defended Altman
YouTube expands AI deepfake detection to politicians, government officials, and journalists
On Tuesday, YouTube announced it is expanding its deepfake detection technology to a select group of government officials, political candidates, and journalists. The tool identifies AI-generated likenesses and lets pilot participants request the remo
すごい!長文コンテキストの機能が実用化されたら、研究やビジネス文書の分析が一気に楽になりそう🤩。でもこれ、倫理面でどうなんだろう?膨大なデータを読み込むということは、プライバシー問題も発生しそうで少し不安…。他社は今後どう追従するのか気になるなぁ。開発スピード速すぎて置いていかれそう!
Super cool to see Gemini 1.5's long context window in action! 😎 Makes me wonder how it'll handle massive datasets compared to older models.
Wow, the long context window in Gemini 1.5 sounds like a game-changer! I'm curious how it'll handle massive datasets in real-world apps. Excited to see where this takes AI! 🚀
The long context window in Gemini 1.5 sounds like a game-changer! I'm curious how it'll handle massive datasets in real-world apps. Any cool examples out there yet? 🤔





Home






