选项
首页
新闻
新的AGI测试证明了具有挑战性,大多数AI模型

新的AGI测试证明了具有挑战性,大多数AI模型

2025-04-10
56

由著名的AI研究员FrançoisChollet共同创立的ARC奖基金会最近在博客文章中推出了一个名为Arc-Agi-2的新基准。该测试旨在突破AI通用智能的界限,到目前为止,这对于大多数AI模型来说都是很难破解的坚果。

根据ARC奖的排行榜,即使是OpenAI的O1-Pro和DeepSeek的R1,也只能管理1%至1.3%的分数。同时,诸如GPT-4.5,Claude 3.7十四行诗和Gemini 2.0 Flash之类的强大非争议模型正在徘徊在1%的标记附近。

ARC-AGI测试挑战具有类似拼图的问题的AI系统,要求它们在不同颜色正方形的网格中识别视觉模式,并生成正确的“答案”网格。这些问题旨在测试AI适应新的,看不见的挑战的能力。

为了建立人类基线,ARC奖基金会有400多人参加了ARC-AGI-2测试。平均而言,这些人类的“面板”取得了60%的成功率,显着优于AI模型。

来自ARC-AGI-22的样本问题。图片信用:ARC奖
FrançoisChollet提出X声称Arc-Agi-2是对AI模型的真实智能的更准确度量,与其前身Arc-Agi-1相比。 ARC奖基金会的测试旨在评估AI是否可以有效地学习其培训数据以外的新技能。

Chollet强调,ARC-AGI-2阻止AI模型依靠“蛮力”计算能力来解决问题,这是他在第一次测试中所承认的一个缺陷。为了解决这个问题,ARC-AGI-2引入了一个效率指标,需要模型即时解释模式而不是依靠记忆。

在博客文章中,ARC奖基金会联合创始人Greg Kamradt强调,情报不仅仅是解决问题或达到高分。他写道:“获得和部署这些功能的效率是至关重要的,定义的组成部分。” “提出的核心问题不仅是,'AI可以获得解决任务的技能吗?'而且,“以什么效率或成本?”

直到2024年12月,Openai的高级推理模型O3超过了所有其他AI模型并匹配人类的表现,arc-agi-1一直保持不败大约五年。但是,O3在Arc-Agi-1上的成功付出了很大的成本。 OpenAI的O3型号O3(Low)的版本在ARC-AGI-1上得分令人印象深刻75.7%,在ARC-AGI-2上仅管理了4%的微小,每项任务的计算能力为200美元。

ARC-AGI-1和ARC-AGI-2的Frontier AI模型性能的比较。图像学分:ARC奖
ARC-AGI-2的引入是在技术行业中许多人呼吁建立新的,不饱和的基准来衡量AI进度的时候。拥抱面孔的联合创始人托马斯·沃尔夫(Thomas Wolf)最近告诉TechCrunch,AI行业缺乏足够的测试来衡量人工通用情报(例如创造力)的关键特征。

除了新的基准测试基金会外,ARC奖基金会宣布了ARC奖2025竞赛,挑战开发人员在ARC-AGI-2测试中获得85%的准确性,而每项任务仅花费0.42美元。

相关文章
AI Leaders Ground the AGI Debate in Reality AI Leaders Ground the AGI Debate in Reality At a recent dinner with business leaders in San Francisco, I threw out a question that seemed to freeze the room: could today's AI ever reach human-like intelligence or beyond? It's a topic that stirs more debate than you might expect. In 2025, tech CEOs are buzzing with optimism about large langua
OpenAI Strikes Back: Sues Elon Musk for Alleged Efforts to Undermine AI Competitor OpenAI Strikes Back: Sues Elon Musk for Alleged Efforts to Undermine AI Competitor OpenAI has launched a fierce legal counterattack against its co-founder, Elon Musk, and his competing AI company, xAI. In a dramatic escalation of their ongoing feud, OpenAI accuses Musk of waging a "relentless" and "malicious" campaign to undermine the company he helped start. According to court d
New AGI Test Proves Challenging, Stumps Majority of AI Models New AGI Test Proves Challenging, Stumps Majority of AI Models The Arc Prize Foundation, co-founded by renowned AI researcher François Chollet, recently unveiled a new benchmark called ARC-AGI-2 in a blog post. This test aims to push the boundaries of AI's general intelligence, and so far, it's proving to be a tough nut to crack for most AI models.According to
评论 (35)
0/200
StephenMartinez
StephenMartinez 2025-04-10 08:00:00

The new AGI test from the Arc Prize Foundation is seriously tough! It's great to see AI being pushed to its limits, but man, it's humbling to see how many models can't crack it. François Chollet's work is always pushing the envelope. Keep at it, AI devs!

StevenSanchez
StevenSanchez 2025-04-10 08:00:00

Arc Prize Foundationの新しいAGIテストは本当に難しいですね!AIが限界まで押し上げられるのは素晴らしいですが、多くのモデルがこれを解けないのを見るのは謙虚な気持ちになります。フランソワ・ショレの仕事はいつも新しい領域を開拓しています。頑張ってください、AI開発者たち!

AndrewHernández
AndrewHernández 2025-04-10 08:00:00

Arc Prize Foundation의 새로운 AGI 테스트는 정말 어렵네요! AI가 한계까지 밀어붙여지는 것은 멋지지만, 많은 모델이 이것을 풀지 못하는 것을 보는 것은 겸손해지게 합니다. 프랑수아 쇼레의 작업은 항상 새로운 영역을 개척하고 있습니다. 계속 노력하세요, AI 개발자들!

BrianGarcia
BrianGarcia 2025-04-10 08:00:00

O novo teste de AGI da Arc Prize Foundation é seriamente difícil! É ótimo ver a IA sendo levada ao seu limite, mas cara, é humilhante ver quantos modelos não conseguem resolvê-lo. O trabalho de François Chollet está sempre expandindo os limites. Continuem assim, desenvolvedores de IA!

GeorgeEvans
GeorgeEvans 2025-04-10 08:00:00

¡El nuevo test de AGI de la Fundación Arc Prize es seriamente difícil! Es genial ver cómo se empuja a la IA hasta sus límites, pero hombre, es humilde ver cuántos modelos no pueden resolverlo. El trabajo de François Chollet siempre está empujando el sobre. ¡Sigan adelante, desarrolladores de IA!

StevenLopez
StevenLopez 2025-04-11 08:00:00

This ARC-AGI-2 test is seriously tough! I tried it with a bunch of AI models and most of them just couldn't handle it. It's cool to see how it challenges the limits of AI, but man, it's frustrating when even the smart ones fail. Maybe next time, right?

返回顶部
OR