Gemini Pro 2.5: A Powerful Coding Assistant Posing Major Threat to ChatGPT
When it comes to evaluating AI for coding assistance, I've developed a set of four standardized tests. These tests are crucial for assessing how well an AI can support your programming efforts. After all, the last thing you need is an AI that adds more bugs to your code, right?
A while back, a reader questioned my approach, suggesting that AIs might perform better with different challenges. It's a valid point, but I stick to these tests because they're straightforward. I use PHP and JavaScript, which aren't the toughest languages out there, and run some scripting queries through the AIs. This consistency allows us to directly compare performance.
The tests include writing a simple WordPress plugin, rewriting a string function, finding a bug I once struggled with, and using programming tools to extract data from Chrome. It's like teaching someone to drive—you wouldn't let them loose on a highway if they can't even get out of the driveway.
To date, only ChatGPT's GPT-4 (and above) LLM has passed all these tests. Interestingly, Perplexity Pro also succeeded, but that's because it runs on the GPT-4 series LLM. On the other hand, Microsoft Copilot, despite using the same LLM, failed all the tests.
Google's Gemini hasn't fared much better. Initially, Bard (the early name for Gemini) failed most tests, and even Gemini Advanced, which costs $20 per month, failed three out of four tests last year.
But now, Google has introduced Gemini Pro 2.5, and it's free for everyone, though with rate limits. I hit those limits after just two prompts during my testing, which is a bit restrictive. It's possible that the rate limiting is based on the complexity of the tasks rather than the number of prompts. My first two requests were to write a full WordPress plugin and fix some code, which might have consumed my limit faster than simpler queries would.
Despite the wait, the results were surprising and worth it.
Test 1: Write a simple WordPress plugin
This time around, Gemini Pro 2.5 knocked it out of the park. The challenge was to create a WordPress plugin that provides a user interface to randomize input lines and distribute duplicates so they're not adjacent.
Previously, Gemini Advanced didn't create a back-end dashboard but required a shortcode in the body text of a public page. It did create a basic UI, but clicking the button did nothing. No matter how I tweaked the prompts, it still failed.
But Gemini Pro 2.5 delivered a solid UI, and the code worked as intended. What really impressed me was the icon choice for the plugin. Most AIs ignore this detail, but Gemini Pro 2.5 picked a relevant icon from the WordPress Dashicon set without any prompting from me. The code was well-documented, with each major segment explained clearly.

Screenshot by David Gewirtz/ZDNET

Screenshot by David Gewirtz/ZDNET
Test 2: Rewrite a string function
In the second test, I asked Gemini Pro 2.5 to modify some string processing code to handle dollars and cents, not just integers. ChatGPT got this right, while Bard eventually succeeded after initial failures.
Last time, Gemini Advanced failed in a subtle yet dangerous way. It didn't allow non-decimal inputs and incorrectly limited numbers to two digits before the decimal point, misunderstanding the concept of dollars and cents. This kind of error could lead to a flood of bug reports if not caught.
Gemini Pro 2.5, however, nailed it. It correctly checked input types, trimmed whitespace, fixed the regular expression to handle leading zeros and decimal-only inputs, and rejected negative inputs. The code was well-commented, with a full set of test examples. While it didn't allow grouping commas or leading currency symbols, these were controlled errors, not crashes, so I consider it a pass.
Test 3: Find a bug
Once, I struggled with a bug in my code that should have worked but didn't. The issue was tricky, and while I was focused on the number of parameters being passed, ChatGPT pointed out that I needed to change something in a hook.
Both Bard and Meta missed the mark, following the same futile path I did. Gemini Advanced, back in February 2024, suggested looking "likely somewhere else in the plugin or WordPress," which was unhelpful.
With Gemini Pro 2.5, I hit the rate limit after the first two tests, so I had to wait until the next day. When I finally ran the test, Gemini Pro 2.5 not only found the bug but also showed me exactly where to fix it, complete with a helpful diagram.

Screenshot by David Gewirtz/ZDNET

Screenshot by David Gewirtz/ZDNET
Test 4: Writing a script
The final test involves understanding Chrome's internal object model, AppleScript, and Keyboard Maestro, a macro-building tool. It's about opening Chrome tabs and setting the active tab based on a parameter.
Most AIs handle the Chrome and AppleScript parts well but often struggle with Keyboard Maestro. Gemini Pro 2.5, however, got it right. It wrote the necessary code to pass variables correctly, added error checking and user notifications, and even provided steps to set up Keyboard Maestro.

Screenshot by David Gewirtz/ZDNET
With all four tests passed, Gemini Pro 2.5 joins the elite group of AI tools that can truly assist with programming tasks.
It was only a matter of time before Google's AI caught up with OpenAI's offerings. Google's 2017 "Attention is all you need" paper kicked off the generative AI boom, so it's no surprise they've reached this point. Gemini Pro 2.5 is slower than ChatGPT Plus, taking between 15 seconds and a minute to respond, but accuracy is more important than speed.
Google has also made Google Code Assist free with generous limits, but it's only valuable if the code generated is of high quality. With Gemini Pro 2.5, that quality is now evident. Though currently marked as "experimental," I expect Google to refine this soon, potentially offering a paid version with fewer rate limits.
It's clear that Gemini Pro 2.5 is set to challenge ChatGPT in the realm of coding assistance. I'll be keeping a close eye on this development and sharing more updates soon.
Related article
OpenAI bolsters ChatGPT security with Yubico partnership for enhanced account protection
OpenAI is taking significant steps to enhance account security.On Thursday, the company introduced Advanced Account Security, a suite of optional protections for ChatGPT users. While designed for high-profile individuals, these features are available
OpenAI Launches ChatGPT for Personal Finance with Bank Account Integration
On Friday, OpenAI introduced a new suite of personal finance tools in preview for U.S.-based ChatGPT Pro subscribers. This feature allows users to link their financial accounts and ask questions covering everything from spending analysis to long-term
OpenAI asserts genuine breakthrough in solving decades-old mathematical puzzle
OpenAI asserts that its latest reasoning model has generated an original mathematical proof that disproves a famous unsolved conjecture in geometry, first proposed by Paul Erdős in 1946.If this sounds familiar, it's because OpenAI has made similar bo
Related Special Topic Recommendations
Comments (24)
0/500
Also ich hab's mal mit Python getestet und muss sagen, die Fehleranalyse ist echt krass. Aber ob das wirklich eine 'Bedrohung' für ChatGPT ist? Die haben doch beide ihre Nischen. Hauptsache, die Preise bleiben im Wettbewerb vernünftig 😅
Como programador, siempre estoy buscando asistentes de IA confiables. Los cuatro tests estandarizados que describes suenan muy útiles, ¡debería probarlos con Gemini y ChatGPT! Si realmente supera en bugs, sería un cambio de juego. 🤔 ¿Habrá algún análisis de costo? A veces estas herramientas premium son caras.
Just read about Gemini Pro 2.5 and wow, those coding tests sound intense! 😅 Curious if it’ll really outshine ChatGPT or just hype. Anyone tried it yet?
This AI coding battle is heating up! Gemini Pro 2.5 sounds like a beast, but I’m curious if it’s really outpacing ChatGPT or just hype. 🤔 Anyone tried it on real projects yet?
Gemini Pro 2.5は本当に強力!コーディングの助けにはChatGPTよりずっと優れてる。私のテストに通してみたら、完璧に合格したよ。唯一の欠点はちょっと高価なこと。でも、コーディングに本気なら、その価値はあるよ!💻
When it comes to evaluating AI for coding assistance, I've developed a set of four standardized tests. These tests are crucial for assessing how well an AI can support your programming efforts. After all, the last thing you need is an AI that adds more bugs to your code, right?
A while back, a reader questioned my approach, suggesting that AIs might perform better with different challenges. It's a valid point, but I stick to these tests because they're straightforward. I use PHP and JavaScript, which aren't the toughest languages out there, and run some scripting queries through the AIs. This consistency allows us to directly compare performance.
The tests include writing a simple WordPress plugin, rewriting a string function, finding a bug I once struggled with, and using programming tools to extract data from Chrome. It's like teaching someone to drive—you wouldn't let them loose on a highway if they can't even get out of the driveway.
To date, only ChatGPT's GPT-4 (and above) LLM has passed all these tests. Interestingly, Perplexity Pro also succeeded, but that's because it runs on the GPT-4 series LLM. On the other hand, Microsoft Copilot, despite using the same LLM, failed all the tests.
Google's Gemini hasn't fared much better. Initially, Bard (the early name for Gemini) failed most tests, and even Gemini Advanced, which costs $20 per month, failed three out of four tests last year.
But now, Google has introduced Gemini Pro 2.5, and it's free for everyone, though with rate limits. I hit those limits after just two prompts during my testing, which is a bit restrictive. It's possible that the rate limiting is based on the complexity of the tasks rather than the number of prompts. My first two requests were to write a full WordPress plugin and fix some code, which might have consumed my limit faster than simpler queries would.
Despite the wait, the results were surprising and worth it.
Test 1: Write a simple WordPress plugin
This time around, Gemini Pro 2.5 knocked it out of the park. The challenge was to create a WordPress plugin that provides a user interface to randomize input lines and distribute duplicates so they're not adjacent.
Previously, Gemini Advanced didn't create a back-end dashboard but required a shortcode in the body text of a public page. It did create a basic UI, but clicking the button did nothing. No matter how I tweaked the prompts, it still failed.
But Gemini Pro 2.5 delivered a solid UI, and the code worked as intended. What really impressed me was the icon choice for the plugin. Most AIs ignore this detail, but Gemini Pro 2.5 picked a relevant icon from the WordPress Dashicon set without any prompting from me. The code was well-documented, with each major segment explained clearly.
Test 2: Rewrite a string function
In the second test, I asked Gemini Pro 2.5 to modify some string processing code to handle dollars and cents, not just integers. ChatGPT got this right, while Bard eventually succeeded after initial failures.
Last time, Gemini Advanced failed in a subtle yet dangerous way. It didn't allow non-decimal inputs and incorrectly limited numbers to two digits before the decimal point, misunderstanding the concept of dollars and cents. This kind of error could lead to a flood of bug reports if not caught.
Gemini Pro 2.5, however, nailed it. It correctly checked input types, trimmed whitespace, fixed the regular expression to handle leading zeros and decimal-only inputs, and rejected negative inputs. The code was well-commented, with a full set of test examples. While it didn't allow grouping commas or leading currency symbols, these were controlled errors, not crashes, so I consider it a pass.
Test 3: Find a bug
Once, I struggled with a bug in my code that should have worked but didn't. The issue was tricky, and while I was focused on the number of parameters being passed, ChatGPT pointed out that I needed to change something in a hook.
Both Bard and Meta missed the mark, following the same futile path I did. Gemini Advanced, back in February 2024, suggested looking "likely somewhere else in the plugin or WordPress," which was unhelpful.
With Gemini Pro 2.5, I hit the rate limit after the first two tests, so I had to wait until the next day. When I finally ran the test, Gemini Pro 2.5 not only found the bug but also showed me exactly where to fix it, complete with a helpful diagram.
Test 4: Writing a script
The final test involves understanding Chrome's internal object model, AppleScript, and Keyboard Maestro, a macro-building tool. It's about opening Chrome tabs and setting the active tab based on a parameter.
Most AIs handle the Chrome and AppleScript parts well but often struggle with Keyboard Maestro. Gemini Pro 2.5, however, got it right. It wrote the necessary code to pass variables correctly, added error checking and user notifications, and even provided steps to set up Keyboard Maestro.
With all four tests passed, Gemini Pro 2.5 joins the elite group of AI tools that can truly assist with programming tasks.
It was only a matter of time before Google's AI caught up with OpenAI's offerings. Google's 2017 "Attention is all you need" paper kicked off the generative AI boom, so it's no surprise they've reached this point. Gemini Pro 2.5 is slower than ChatGPT Plus, taking between 15 seconds and a minute to respond, but accuracy is more important than speed.
Google has also made Google Code Assist free with generous limits, but it's only valuable if the code generated is of high quality. With Gemini Pro 2.5, that quality is now evident. Though currently marked as "experimental," I expect Google to refine this soon, potentially offering a paid version with fewer rate limits.
It's clear that Gemini Pro 2.5 is set to challenge ChatGPT in the realm of coding assistance. I'll be keeping a close eye on this development and sharing more updates soon.
OpenAI bolsters ChatGPT security with Yubico partnership for enhanced account protection
OpenAI is taking significant steps to enhance account security.On Thursday, the company introduced Advanced Account Security, a suite of optional protections for ChatGPT users. While designed for high-profile individuals, these features are available
OpenAI Launches ChatGPT for Personal Finance with Bank Account Integration
On Friday, OpenAI introduced a new suite of personal finance tools in preview for U.S.-based ChatGPT Pro subscribers. This feature allows users to link their financial accounts and ask questions covering everything from spending analysis to long-term
OpenAI asserts genuine breakthrough in solving decades-old mathematical puzzle
OpenAI asserts that its latest reasoning model has generated an original mathematical proof that disproves a famous unsolved conjecture in geometry, first proposed by Paul Erdős in 1946.If this sounds familiar, it's because OpenAI has made similar bo
Also ich hab's mal mit Python getestet und muss sagen, die Fehleranalyse ist echt krass. Aber ob das wirklich eine 'Bedrohung' für ChatGPT ist? Die haben doch beide ihre Nischen. Hauptsache, die Preise bleiben im Wettbewerb vernünftig 😅
Como programador, siempre estoy buscando asistentes de IA confiables. Los cuatro tests estandarizados que describes suenan muy útiles, ¡debería probarlos con Gemini y ChatGPT! Si realmente supera en bugs, sería un cambio de juego. 🤔 ¿Habrá algún análisis de costo? A veces estas herramientas premium son caras.
Just read about Gemini Pro 2.5 and wow, those coding tests sound intense! 😅 Curious if it’ll really outshine ChatGPT or just hype. Anyone tried it yet?
This AI coding battle is heating up! Gemini Pro 2.5 sounds like a beast, but I’m curious if it’s really outpacing ChatGPT or just hype. 🤔 Anyone tried it on real projects yet?
Gemini Pro 2.5は本当に強力!コーディングの助けにはChatGPTよりずっと優れてる。私のテストに通してみたら、完璧に合格したよ。唯一の欠点はちょっと高価なこと。でも、コーディングに本気なら、その価値はあるよ!💻





Home






