選項
首頁 消息 Meta的AI模型基準:誤導性?

Meta的AI模型基準:誤導性?

發布日期 發布日期 2025年04月10日
作者 作者 TimothyMitchell
視圖 視圖 10

Meta的AI模型基準:誤導性?

因此,梅塔(Meta)在周末放棄了他們的新AI模特Maverick,並且已經在LM Arena上搶購第二名的人已經引起了轟動。您知道,這是人類可以扮演法官和陪審團的地方,比較不同的AI模型並選擇他們的收藏夾。但是,抓住,有一個轉折!事實證明,Maverick版本在LM Arena上構成其內容並不與您可以下載並與開發人員一起播放的版本完全相同。

X上的一些鷹眼的AI研究人員(是的,以前稱為Twitter的平台)發現Meta將LM Arena版本稱為“實驗性聊天版本”。而且,如果您瀏覽Llama網站,則有一張圖表灑了豆子,稱測試是通過“ Llama 4 Maverick進行了優化的對話性”進行的。現在,我們之前已經討論過這個問題,但是LM Arena並不是測量AI性能的金標準。大多數AI公司不會只是為了在此測試中得分更好,或者至少他們不承認自己的模型。

問題是,當您調整模型以ace的基準測試,然後向公眾發布不同的“香草”版本時,開發人員很難弄清楚該模型在現實世界中的實際表現如何。另外,這有點具有誤導性,對嗎?基準,因為它們是有缺陷的,應該讓我們清楚地了解模型可以和在不同任務中無法做的事情。

X上的研究人員很快就會注意到您可以下載的特立獨行者與LM競技場上的研究人員之間存在一些巨大差異。競技場版本顯然是關於表情符號的,並且喜歡給您漫長的,吸引人的答案。

我們已經與Meta和經營LM Arena的Chatbot Arena的人們聯繫,以了解他們對所有這一切的看法。敬請關注!

相關文章
Meta Defends Llama 4 Release, Cites Bugs as Cause of Mixed Quality Reports Meta Defends Llama 4 Release, Cites Bugs as Cause of Mixed Quality Reports Over the weekend, Meta, the powerhouse behind Facebook, Instagram, WhatsApp, and Quest VR, surprised everyone by unveiling their latest AI language model, Llama 4. Not just one, but three new versions were introduced, each boasting enhanced capabilities thanks to the "Mixture-of-Experts" architectur
Law Professors Support Authors in AI Copyright Battle Against Meta Law Professors Support Authors in AI Copyright Battle Against Meta A group of copyright law professors has thrown their support behind authors suing Meta, alleging that the tech giant trained its Llama AI models on e-books without the authors' consent. The professors filed an amicus brief on Friday in the U.S. District Court for the Northern District of California,
Meta AI will soon train on EU users’ data Meta AI will soon train on EU users’ data Meta has recently revealed its plans to train its AI using data from EU users of its platforms, such as Facebook and Instagram. This initiative will tap into public posts, comments, and even chat histories with Meta AI, but rest assured, your private messages with friends and family are off-limits.
評論 (35)
0/200
JerryGonzalez
JerryGonzalez 2025年04月10日 10:18:45

Meta's AI model benchmarks seem a bit off to me. Maverick got second place, but I've used it and it's not that great. The interface is clunky and the results are hit or miss. Maybe they're just trying to hype it up? I'd give it a pass for now.

CarlKing
CarlKing 2025年04月10日 10:18:45

MetaのAIモデルのベンチマークは私には少しおかしいように感じます。Maverickは2位を獲得しましたが、使ってみた感じではそれほど良くありません。インターフェースがぎこちなく、結果も当たり外れがあります。もしかしたら、ただ盛り上げようとしているだけかもしれませんね。今はパスしておきます。

SamuelEvans
SamuelEvans 2025年04月10日 10:18:45

Meta의 AI 모델 벤치마크가 내겐 좀 이상해 보여. Maverick이 2위를 했지만, 써보니 그리 대단하지 않아. 인터페이스가 어색하고 결과도 들쑥날쑥해. 어쩌면 그냥 과대광고하려고 하는 건지도 몰라. 지금은 패스할게.

BenWalker
BenWalker 2025年04月10日 10:18:45

Os benchmarks do modelo de IA da Meta parecem um pouco estranhos para mim. O Maverick ficou em segundo lugar, mas eu usei e não é tão bom assim. A interface é desajeitada e os resultados são inconsistentes. Talvez eles estejam apenas tentando criar hype? Eu passaria por agora.

RobertLewis
RobertLewis 2025年04月10日 10:18:45

Los benchmarks del modelo de IA de Meta me parecen un poco extraños. Maverick quedó en segundo lugar, pero lo he usado y no es tan bueno. La interfaz es torpe y los resultados son inconsistentes. ¿Quizás solo están tratando de generar hype? Por ahora, lo dejaría pasar.

KevinBaker
KevinBaker 2025年04月11日 18:25:04

I tried Meta's Maverick and it's pretty good, but those benchmarks seem a bit off to me. It's not as smooth as they claim, and sometimes it's just plain wrong. I'm not sure if it's worth the hype. Maybe they need to tweak their testing methods?

回到頂部
OR