Home News Meta Defends Llama 4 Release, Cites Bugs as Cause of Mixed Quality Reports

Meta Defends Llama 4 Release, Cites Bugs as Cause of Mixed Quality Reports

April 22, 2025
BillyAdams
29

Over the weekend, Meta, the powerhouse behind Facebook, Instagram, WhatsApp, and Quest VR, surprised everyone by unveiling their latest AI language model, Llama 4. Not just one, but three new versions were introduced, each boasting enhanced capabilities thanks to the "Mixture-of-Experts" architecture and a novel training approach called MetaP, which involves fixed hyperparameters. What's more, all three models come with expansive context windows, allowing them to process more information in a single interaction.

Despite the excitement of the release, the AI community's reaction has been lukewarm at best. On Saturday, Meta made two of these models, Llama 4 Scout and Llama 4 Maverick, available for download and use, but the response has been far from enthusiastic.

Llama 4 Sparks Confusion and Criticism Among AI Users

An unverified post on the 1point3acres forum, a popular Chinese language community in North America, found its way to the r/LocalLlama subreddit on Reddit. The post, allegedly from a researcher at Meta’s GenAI organization, claimed that Llama 4 underperformed on internal third-party benchmarks. It suggested that Meta's leadership had manipulated the results by blending test sets during post-training to meet various metrics and present a favorable outcome. The authenticity of this claim was met with skepticism, and Meta has yet to respond to inquiries from VentureBeat.

Yet, the doubts about Llama 4's performance didn't stop there. On X, user @cto_junior expressed disbelief at the model's performance, citing an independent test where Llama 4 Maverick scored a mere 16% on the aider polyglot benchmark, which tests coding tasks. This score is significantly lower than that of older, similarly sized models like DeepSeek V3 and Claude 3.7 Sonnet.

AI PhD and author Andriy Burkov also took to X to question the model's advertised 10 million-token context window for Llama 4 Scout, stating that it's "virtual" because the model wasn't trained on prompts longer than 256k tokens. He warned that sending longer prompts would likely result in low-quality outputs.

On the r/LocalLlama subreddit, user Dr_Karminski shared disappointment with Llama 4, comparing its poor performance to DeepSeek’s non-reasoning V3 model on tasks like simulating ball movements within a heptagon.

Nathan Lambert, a former Meta researcher and current Senior Research Scientist at AI2, criticized Meta's benchmark comparisons on his Interconnects Substack blog. He pointed out that the Llama 4 Maverick model used in Meta's promotional materials was different from the one publicly released, optimized instead for conversationality. Lambert noted the discrepancy, saying, "Sneaky. The results below are fake, and it is a major slight to Meta’s community to not release the model they used to create their major marketing push." He added that while the promotional model was "tanking the technical reputation of the release because its character is juvenile," the actual model available on other platforms was "quite smart and has a reasonable tone."

Meta Responds, Denying 'Training on Test Sets' and Citing Bugs in Implementation Due to Fast Rollout

In response to the criticism and accusations, Meta's VP and Head of GenAI, Ahmad Al-Dahle, took to X to address the concerns. He expressed enthusiasm for the community's engagement with Llama 4 but acknowledged reports of inconsistent quality across different services. He attributed these issues to the rapid rollout and the time needed for public implementations to stabilize. Al-Dahle firmly denied the allegations of training on test sets, emphasizing that the variable quality was due to implementation bugs rather than any misconduct. He reaffirmed Meta's belief in the significant advancements of the Llama 4 models and their commitment to working with the community to realize their potential.

However, the response did little to quell the community's frustrations, with many still reporting poor performance and demanding more technical documentation about the models' training processes. This release has faced more issues than previous Llama versions, raising questions about its development and rollout.

The timing of this release is notable, as it follows the departure of Joelle Pineau, Meta's VP of Research, who announced her exit on LinkedIn last week with gratitude for her time at the company. Pineau had also promoted the Llama 4 model family over the weekend.

As Llama 4 continues to be adopted by other inference providers with mixed results, it's clear that the initial release has not been the success Meta might have hoped for. The upcoming Meta LlamaCon on April 29, which will be the first gathering for third-party developers of the model family, is likely to be a hotbed of discussion and debate. We'll be keeping a close eye on developments, so stay tuned.

Related article
GAIA Introduces New Benchmark in Quest for True Intelligence Beyond ARC-AGI GAIA Introduces New Benchmark in Quest for True Intelligence Beyond ARC-AGI Intelligence is everywhere, yet gauging it accurately feels like trying to catch a cloud with your bare hands. We use tests and benchmarks, like college entrance exams, to get a rough idea. Each year, students cram for these tests, sometimes even scoring a perfect 100%. But does that perfect score m
AI Startup Secures $7.5M to Revolutionize Commercial Insurance for 24M Underprotected Small Businesses in America AI Startup Secures $7.5M to Revolutionize Commercial Insurance for 24M Underprotected Small Businesses in America 1Fort, a New York-based startup, has secured a $7.5 million seed funding round to revolutionize how small businesses secure commercial insurance through its AI-driven platform. With a staggering 200% month-over-month revenue growth in 2024, 1Fort is set to overhaul the outdated manual processes that
Law Professors Support Authors in AI Copyright Battle Against Meta Law Professors Support Authors in AI Copyright Battle Against Meta A group of copyright law professors has thrown their support behind authors suing Meta, alleging that the tech giant trained its Llama AI models on e-books without the authors' consent. The professors filed an amicus brief on Friday in the U.S. District Court for the Northern District of California,
Comments (0)
0/200
Back to Top
OR