GitHub Copilot's AI Tested: Mixed Coding Success Leaves Me Baffled

Home

News

April 21, 2025

GregoryAllen

288

Exploring the Inconsistencies in AI Coding Tools

It's truly puzzling how AI tools, all built on the same foundational large language model, can yield such varied results. For instance, ChatGPT, Perplexity, and GitHub Copilot all leverage OpenAI's GPT-4 model. Yet, my recent tests showed stark differences in performance: while ChatGPT and Perplexity's pro plans excelled, GitHub Copilot had a 50% success rate.

I conducted these tests using GitHub Copilot integrated within a VS Code environment. I'll share a detailed guide on setting this up in an upcoming article. For now, let's dive into the specifics of the tests I ran.

If you're curious about my testing methodology and the prompts used, you can check out my detailed guide on evaluating an AI chatbot's coding capabilities.

TL;DR: GitHub Copilot managed to pass two out of the four tests I conducted.

Test 1: Writing a WordPress Plugin

This test was a complete disappointment. It was my initial experiment, leaving me unsure if GitHub Copilot struggles with coding or if the interaction constraints within VS Code hinder its capabilities.

Here's the context: I asked the AI to develop a fully functional WordPress plugin that includes an admin interface and operational logic. The plugin's task was to accept a list of names, sort them, and separate any duplicates to avoid adjacency.

This task stemmed from a real-world need from my wife's digital goods e-commerce business, where she manages an active Facebook group.

While five out of the ten AI models tested passed this test entirely, three passed partially, and two, including Microsoft Copilot, failed completely. GitHub Copilot, despite being given the same prompt, only produced PHP code. Although the problem could indeed be solved with PHP alone, GitHub Copilot attempted to reference JavaScript without actually generating it.

Screenshot by David Gewirtz/ZDNET

When I tried to prompt GitHub Copilot from within a JavaScript file to complete the task, it bizarrely responded with more PHP code, still referencing a non-existent JavaScript file.

Screenshot by David Gewirtz/ZDNET

Test 2: Rewriting a String Function

This test was relatively straightforward: I provided a function meant to validate dollars and cents but only checking for whole dollars. The challenge was for the AI to correct the function.

GitHub Copilot did modify the code, but the result was problematic. It assumed that any input string was valid, which would cause errors if the string was empty. Additionally, the updated regular expression couldn't handle various edge cases, such as inputs like "3.", ".3", or "00.30". For a function meant to validate currency, such oversights are unacceptable, marking another fail for GitHub Copilot.

Test 3: Finding an Annoying Bug

Here, GitHub Copilot shone. This test was based on a real coding challenge I faced, where the error message didn't directly point to the actual issue. It's a bit like a coding riddle, requiring deep understanding of WordPress API calls to solve.

While Microsoft Copilot, Gemini, and Meta Code Llama stumbled on this test, GitHub Copilot nailed it, showcasing its capability to tackle complex, real-world problems.

Test 4: Writing a Script

GitHub Copilot also succeeded in this test, where Microsoft Copilot fell short. The task involved creating a script that needed to integrate AppleScript, the Chrome object model, and a Mac-specific utility called Keyboard Maestro.

To pass, the AI needed to recognize and address the nuances of all three environments, and GitHub Copilot did just that.

Final Thoughts

It's disheartening to see GitHub Copilot, which uses the advanced GPT-4 model, fail half of the tests. Given GitHub's status as a leading source management platform, one would expect its AI coding support to be more dependable.

However, the world of AI is ever-evolving, and I'm optimistic that GitHub Copilot's performance will improve over time. We'll revisit this in a few months to see how it's progressed.

Do you rely on AI for coding assistance? Which AI tool is your go-to? Have you given GitHub Copilot a try? Share your experiences in the comments below.

Stay updated with my daily project progress on social media. Don't forget to sign up for my weekly newsletter, and follow me on Twitter/X at @DavidGewirtz, on Facebook at Facebook.com/DavidGewirtz, on Instagram at Instagram.com/DavidGewirtz, on Bluesky at @DavidGewirtz.com, and on YouTube at YouTube.com/DavidGewirtzTV.

DeepSeek Code poised for launch As AI technology accelerates, DeepSeek is at a thrilling juncture. The AI company recently revealed it has secured over 70 billion yuan in funding. Leadership has emphasized a commitment to groundbreaking AI research over immediate commercial gains.

Musk’s Grok: 1.5 Trillion Parameters and Cursor Code Absorption—Game Changer or Bluff? Elon Musk is finally making a move.In the AI programming race, OpenAI and Anthropic are accelerating, while xAI appears to be lagging. Musk has often stated his aim to rival Claude, yet despite multiple updates to the Grok4.X series, the results look

OpenAI Secretly Changes Charter to Make Removing Altman Harder Following the 2023 coup-like incident, OpenAI has further solidified protections for CEO Sam Altman by updating its corporate bylaws. Recently released court documents reveal that Altman's position is now rock-solid, with substantially higher barrier

Related Special Topic Recommendations

Business

Best AI Recruiting Tools: Screen Resumes & Automate Candidate Interview Scheduling

Discover the 2026 latest top-rated AI recruiting tools on XIX.AI. Our curated list features powerful, game-changing solutions for screening resumes and automating candidate interview scheduling. Compare free vs paid options with real-world tests and weekly updated rankings. Find your perfect hiring assistant and streamline your recruitment today!

10 tools

xix.ai

Productivity

AI Personal Wellness & Focus Coaches: Manage Burnout & Boost Mental Energy Levels

Discover the 2026 best AI personal wellness and focus coaches on XIX.AI. Our curated rankings feature top-rated, game-changing tools to manage burnout and boost mental energy. Compare free vs paid options with real-world insights. Unlock your path to peak productivity and well-being today.

10 tools

xix.ai

chatbot

Top-Rated AI Romantic Chatbots: Build Long-Term Relationships with Consistent Personalities

Discover the 2026 latest top-rated AI romantic chatbots for building genuine, long-term connections. Our curated list features powerful, consistent personalities, free vs paid comparisons, and real-world tests. Find your perfect companion and start building today at XIX.AI.

10 tools

xix.ai

Education and Learning

Best AI Data Science Mentors: Master SQL, Pandas & Machine Learning Workflows

Discover the 2026 best AI data science mentors to master SQL, Pandas & ML workflows. Explore our top-rated, curated selection at XIX.AI for powerful, game-changing guidance. Compare free vs paid options with real-world insights. Unlock your data science mastery today.

10 tools

xix.ai

chatbot

Best AI Flirting & Conversation Trainers: Improve Social Charisma and Confidence in Real-Time

Discover the 2026 best AI flirting and conversation trainers on XIX.AI. Our curated, top-rated selection helps you build social charisma and confidence in real-time. Explore must-try, game-changing tools with free vs paid comparisons and weekly updated rankings. Unlock your social edge today.

10 tools

xix.ai

code

Best AI Tools for Automated Unit Testing: Generate Jest, PyTest & JUnit Test Cases in One Click

Discover the 2026 latest top-rated AI tools for automated unit testing. Our curated selection features powerful, game-changing solutions to generate Jest, PyTest & JUnit test cases instantly. Compare free vs paid options with real-world tests and weekly updated rankings on XIX.AI. Unlock your AI edge and boost development productivity today.

10 tools

xix.ai

Comments (40)

0/500

Please login first

HarryMartinez

May 28, 2026 at 2:00:14 AM EDT

Honestly, this doesn't surprise me. Even with the same underlying model, the way each tool fine-tunes prompts and handles context makes a huge difference. Copilot's mixed results probably come from its integration with IDE specifics. Still, it's baffling why the same model can give such inconsistent outputs for similar tasks. 🤔

EricAllen

May 18, 2026 at 10:00:12 PM EDT

Ich hab's auch ausprobiert und finde es echt seltsam, dass die Ergebnisse so unterschiedlich sind, obwohl die Basis ähnlich ist. Manchmal schreibt Copilot super Code, manchmal totalen Unsinn. Vielleicht liegt's an der Integration in die IDE? 🤔 Auf jeden Fall muss da noch viel verbessert werden, bevor ich mich voll darauf verlassen kann.

ArthurJackson

March 11, 2026 at 4:00:47 PM EDT

Интересно, почему ИИ-инструменты на одной базовой модели GPT-4 работают так по-разному? GitHub Copilot иногда генерирует код, который выглядит логично, но потом выдает полную ерунду 😅 Может, дело в тонкой настройке или контексте? Это напоминает мне капризного коллегу-программиста, который то гений, то беспомощен.

LarryMartin

November 27, 2025 at 7:30:43 AM EST

이 기사 읽어보니 AI 코딩 도구의 편차가 정말 신기하네요. 같은 기술인데 결과가 이렇게 다를 수 있다니... 개발자로 일하면서 Copilot이 가끔 완벽한 코드를 써주다가도 갑자기 엉뚱한 걸 제안해서 당황했던 적이 많아요. 🤔 앞으로 AI 도구들이 더 안정화되길 바랍니다!

PaulRoberts

November 7, 2025 at 11:30:36 PM EST

Acho frustrante que ferramentas como Copilot e ChatGPT usem o mesmo modelo base mas tenham performances tão diferentes. Isso me faz questionar se a implementação é realmente bem feita ou se só estão colocando um nome famoso pra vender mais. 🤔

EricDavis

November 1, 2025 at 12:30:33 PM EDT

看完這篇測試我笑了 😂 明明都是用GPT-4，怎麼Copilot寫程式碼的表現這麼迷？有時候像資深工程師，有時候根本在胡言亂語。該不會AI也有生理期吧？（開玩笑的）不過這倒是讓我猶豫要不要續訂了...