Grok by xAI Gains Strong Capabilities for Baldur’s Gate Queries

Different AI labs pursue distinct goals. For example, OpenAI has historically centered its efforts on consumer applications, whereas its competitor Anthropic typically targets the enterprise market. As recent reports indicate, Elon Musk's xAI has been notably prioritizing guidance for video game walkthroughs.
On Friday, Grace Kay of Business Insider released a comprehensive investigation into xAI, the artificial intelligence startup recently acquired by SpaceX, highlighting the challenges Musk reportedly creates for his staff. One specific detail was particularly striking:
According to sources familiar with the situation, a model launch was postponed for several days last year because Musk was unhappy with the chatbot's answers to intricate questions about the video game "Baldur's Gate." Senior engineers were reportedly reassigned from other initiatives to refine these responses prior to the release.
Naturally, one can empathize with a skilled engineer expecting to solve profound challenges in machine intelligence, only to be diverted into helping a 54-year-old man progress in a video game. However, this story prompts a more immediate query: Did Musk ultimately obtain the gaming expertise he sought?
To find out, our in-house RPG expert Ram Iyer compiled five general questions about Baldur's Gate. We posed these to xAI's Grok and the three leading AI models in an informal test we've dubbed BaldurBench.
In the spirit of transparency, all chat transcripts are publicly available for review: Grok, ChatGPT, Claude, and Gemini.
First, the positive outcome: Grok actually provides quite solid information. Its answers were somewhat heavy on gaming terminology—using "save-scumming" instead of simply saving and "DPS" for damage—but the guidance was both helpful and knowledgeable, assuming you understood the jargon. As one might anticipate, Grok also shows a strong preference for tables and theorycrafting.
Numerous Baldur's Gate guides exist, and the models generally pull from similar sources, making stylistic differences the primary distinction. ChatGPT favors bulleted lists and concise phrases, while Gemini emphasizes key terms by making them bold.
Techcrunch eventSave up to $300 or 30% at TechCrunch Founder Summit
Join over 1,000 founders and investors at TechCrunch Founder Summit 2026 for a focused day on growth, execution, and scaling in the real world. Gain insights from founders and investors who have defined the sector. Network with peers at similar growth stages. Leave with actionable strategies you can implement right away.
Offer ends March 13.
Save up to $300 or 30% at TechCrunch Founder Summit
Join over 1,000 founders and investors at TechCrunch Founder Summit 2026 for a focused day on growth, execution, and scaling in the real world. Gain insights from founders and investors who have defined the sector. Network with peers at similar growth stages. Leave with actionable strategies you can implement right away.
Offer ends March 13.
Boston, MA | June 9, 2026REGISTER NOWThe most unexpected response came from Claude, which was especially cautious about sharing details that could ruin the game's surprises. When asked about optimal party compositions, it concluded its advice with, "don't stress too much and just play what sounds fun to you." Thanks, Claude!
It's crucial to remember that, according to Business Insider's report, this is a specific area where xAI concentrated its efforts to match competitors. Therefore, we shouldn't overinterpret the fact that, after the reported intensive work, Grok's advice ended up being comparable to the other models. Nevertheless, it's reassuring to see that xAI can deliver when it focuses its resources.
Loading the player…
Related article
Trace raises $3M to tackle enterprise AI agent adoption hurdles
Despite their potential, AI agents have struggled to gain traction in the enterprise. One emerging startup believes the core issue is a lack of context.Launched as part of Y Combinator’s 2025 summer cohort, Trace is a workflow orchestration startup d
Hightouch hits $100M ARR with AI-powered marketing tools
In the past, marketers depended on designers and other creative specialists to produce images and videos for personalized online advertising campaigns.In late 2024, seven-year-old startup Hightouch introduced an AI-driven service that enables marketi
Meta's natural gas surge may fuel South Dakota's power grid
Data centers have grown so massive that their electricity consumption now matches that of entire U.S. states. Consider Meta's Hyperion AI data center: once finished, it will consume as much power as South Dakota.Meta recently announced funding for se
Related Special Topic Recommendations
Comments (0)
0/500

Different AI labs pursue distinct goals. For example, OpenAI has historically centered its efforts on consumer applications, whereas its competitor Anthropic typically targets the enterprise market. As recent reports indicate, Elon Musk's xAI has been notably prioritizing guidance for video game walkthroughs.
On Friday, Grace Kay of Business Insider released a comprehensive investigation into xAI, the artificial intelligence startup recently acquired by SpaceX, highlighting the challenges Musk reportedly creates for his staff. One specific detail was particularly striking:
According to sources familiar with the situation, a model launch was postponed for several days last year because Musk was unhappy with the chatbot's answers to intricate questions about the video game "Baldur's Gate." Senior engineers were reportedly reassigned from other initiatives to refine these responses prior to the release.
Naturally, one can empathize with a skilled engineer expecting to solve profound challenges in machine intelligence, only to be diverted into helping a 54-year-old man progress in a video game. However, this story prompts a more immediate query: Did Musk ultimately obtain the gaming expertise he sought?
To find out, our in-house RPG expert Ram Iyer compiled five general questions about Baldur's Gate. We posed these to xAI's Grok and the three leading AI models in an informal test we've dubbed BaldurBench.
In the spirit of transparency, all chat transcripts are publicly available for review: Grok, ChatGPT, Claude, and Gemini.
First, the positive outcome: Grok actually provides quite solid information. Its answers were somewhat heavy on gaming terminology—using "save-scumming" instead of simply saving and "DPS" for damage—but the guidance was both helpful and knowledgeable, assuming you understood the jargon. As one might anticipate, Grok also shows a strong preference for tables and theorycrafting.
Numerous Baldur's Gate guides exist, and the models generally pull from similar sources, making stylistic differences the primary distinction. ChatGPT favors bulleted lists and concise phrases, while Gemini emphasizes key terms by making them bold.
Techcrunch eventSave up to $300 or 30% at TechCrunch Founder Summit
Join over 1,000 founders and investors at TechCrunch Founder Summit 2026 for a focused day on growth, execution, and scaling in the real world. Gain insights from founders and investors who have defined the sector. Network with peers at similar growth stages. Leave with actionable strategies you can implement right away.
Offer ends March 13.
Save up to $300 or 30% at TechCrunch Founder Summit
Join over 1,000 founders and investors at TechCrunch Founder Summit 2026 for a focused day on growth, execution, and scaling in the real world. Gain insights from founders and investors who have defined the sector. Network with peers at similar growth stages. Leave with actionable strategies you can implement right away.
Offer ends March 13.
Boston, MA | June 9, 2026REGISTER NOWThe most unexpected response came from Claude, which was especially cautious about sharing details that could ruin the game's surprises. When asked about optimal party compositions, it concluded its advice with, "don't stress too much and just play what sounds fun to you." Thanks, Claude!
It's crucial to remember that, according to Business Insider's report, this is a specific area where xAI concentrated its efforts to match competitors. Therefore, we shouldn't overinterpret the fact that, after the reported intensive work, Grok's advice ended up being comparable to the other models. Nevertheless, it's reassuring to see that xAI can deliver when it focuses its resources.
Loading the player…
Trace raises $3M to tackle enterprise AI agent adoption hurdles
Despite their potential, AI agents have struggled to gain traction in the enterprise. One emerging startup believes the core issue is a lack of context.Launched as part of Y Combinator’s 2025 summer cohort, Trace is a workflow orchestration startup d
Hightouch hits $100M ARR with AI-powered marketing tools
In the past, marketers depended on designers and other creative specialists to produce images and videos for personalized online advertising campaigns.In late 2024, seven-year-old startup Hightouch introduced an AI-driven service that enables marketi
Meta's natural gas surge may fuel South Dakota's power grid
Data centers have grown so massive that their electricity consumption now matches that of entire U.S. states. Consider Meta's Hyperion AI data center: once finished, it will consume as much power as South Dakota.Meta recently announced funding for se





Home






