DeepSeek's R1 and V3 Coding Skills Tested: We're Not Doomed Yet
Introducing DeepSeek: A New Player in the AI Arena
DeepSeek burst onto the scene over the weekend, capturing attention worldwide for three compelling reasons:
- It's an AI chatbot hailing from China, a notable departure from the usual U.S.-based offerings.
- It's open source, which is a big deal in the tech community.
- It runs on significantly less infrastructure than its heavyweight counterparts, making it an intriguing option for many.
While the U.S. government's scrutiny over TikTok and potential Chinese government involvement in its code has raised eyebrows, DeepSeek's emergence from China naturally draws similar attention. However, we're steering clear of politics here. Instead, let's dive into how DeepSeek V3 and DeepSeek R1 stack up against other AI models in coding tasks.
According to DeepSeek's own guidance:
- Choose V3 for tasks that demand depth and accuracy, like solving complex math problems or generating intricate code.
- Opt for R1 when you need quick, high-volume applications, such as customer support automation or basic text processing.
You can toggle between R1 and V3 using a little button in the chat interface. If it's blue, you're using R1.

Screenshot by David Gewirtz/ZDNET
So, how did they fare? Both models showed promise but weren't flawless. Let's explore the results.
Test 1: Crafting a WordPress https://img.xix.aiplugin
My first test, inspired by my wife's need for a WordPress https://img.xix.aiplugin to manage an involvement device for her online group, is a classic. The https://img.xix.aiplugin had to accept a list of names, sort them, and ensure duplicates weren't side by side. I've thrown this challenge at numerous AIs, and it's a tough one.

Screenshot by David Gewirtz/ZDNET
DeepSeek V3 nailed it, creating a user interface and program logic that met the brief perfectly. R1 took a different approach, offering a whopping 4502 words of analysis before sharing the code. The UI was broader, but both the UI and logic worked, so R1 also passed.

Screenshot by David Gewirtz/ZDNET

Screenshot by David Gewirtz/ZDNET
So far, both V3 and R1 have passed one out of four tests.
Test 2: Rewriting a String Function
A user had trouble entering dollars and cents into a donation field, which my original code didn't allow. The task was to modify the routine to accept both. DeepSeek did generate functional code, but there's room for improvement.
V3's code was overly long and repetitive, while R1's reasoning before generating the code was also lengthy. Both models validated up to two decimal places, but they didn't handle very large numbers well. R1's use of JavaScript's Number conversion without checking for edge cases could lead to crashes.
Interestingly, R1 did provide a nice list of test cases:

Screenshot by David Gewirtz/ZDNET
I'm giving the point to V3 because its code wouldn't crash and would produce the expected results. R1 fails due to potential crashes from non-string inputs. That's two wins out of four for V3 and one for R1.
Test 3: Tracking Down a Pesky Bug
This test stemmed from a bug I struggled to find. The challenge was that the obvious answer based on the error message was wrong, which often tricks AIs. Solving it requires understanding WordPress API calls, seeing beyond the error message, and pinpointing the bug.
Both V3 and R1 passed this test with nearly identical answers, bringing V3 to three out of four wins and R1 to two out of four. DeepSeek is already outperforming Gemini, Copilot, Claude, and Meta.
Test 4: Crafting a Script
This test is tough because it involves three environments: AppleScript, the Chrome object model, and Keyboard Maestro. ChatGPT aced it, but DeepSeek V3 and R1 fell short. Neither model understood the need to split tasks between Keyboard Maestro and Chrome, and their AppleScript knowledge was weak.
R1 made incorrect assumptions, like assuming a front window always exists and that the front-running program would always be Chrome. This left V3 with three correct tests and one fail, and R1 with two correct tests and two fails.
Final Thoughts
DeepSeek's insistence on using a public cloud email like Gmail rather than my corporate domain was frustrating. There were also some responsiveness issues that made testing take longer than expected.
I initially struggled to sign up due to this error:
DeepSeek's online services have recently faced large-scale malicious attacks. To ensure continued service, registration is temporarily limited to +86 phone numbers. Existing users can log in as usual. Thanks for your understanding and support.
Once in, I was able to run the tests. DeepSeek tends to be verbose with its code. The AppleScript in Test 4 was both incorrect and unnecessarily long. The regular expression in Test 2 could have been more maintainable, though V3 got it right.
I'm impressed that V3 beat out Gemini, Copilot, and Meta, but it's still at the old GPT-3.5 level, suggesting there's room for growth. R1's performance was disappointing. Given the choice, I'd stick with ChatGPT for programming help.
That said, for a new tool running on much less infrastructure, DeepSeek is definitely one to keep an eye on.
What are your thoughts? Have you tried DeepSeek? Do you use any AIs for programming support? Let us know in the comments below.
Follow my daily project updates on social media, subscribe to my weekly newsletter, and connect with me on Twitter/X at @DavidGewirtz, Facebook at Facebook.com/DavidGewirtz, Instagram at Instagram.com/DavidGewirtz, Bluesky at @DavidGewirtz.com, and YouTube at YouTube.com/DavidGewirtzTV.
Related article
OpenAI Secretly Changes Charter to Make Removing Altman Harder
Following the 2023 coup-like incident, OpenAI has further solidified protections for CEO Sam Altman by updating its corporate bylaws. Recently released court documents reveal that Altman's position is now rock-solid, with substantially higher barrier
Meta AI now responds to buyer messages on Facebook Marketplace
Facebook Marketplace introduces new Meta AI features, including automated replies to buyer inquiries, the company announced Thursday. The platform also leverages AI to accelerate item listings, summarize seller profiles, and now lets sellers offer sh
OpenAI outlines AI economy with public wealth funds, robot taxes, and four-day week
As governments struggle to manage the economic impact of superintelligent machines, OpenAI has released a set of policy proposals outlining how wealth and work could be reshaped in an "intelligence age." The ideas blend traditional left-leaning mecha
Related Special Topic Recommendations
Comments (14)
0/500
Als Entwickler finde ich es super, dass jetzt auch China mit DeepSeek in den Open-Source-AI-Markt einsteigt. Die Coding-Tests klingen vielversprechend – vielleicht wird die Konkurrenz zwischen den Modellen ja endlich mal die Preise drücken. Hoffentlich bleibt das Projekt langfristig unabhängig und wird nicht von irgendwelchen Firmen vereinnahmt. 🤔
DeepSeek's open-source approach is a game-changer! I'm stoked to see a Chinese AI shaking things up. The coding skills are solid, but I wonder how it’ll stack against giants like GPT in the long run. Exciting times! 🚀
DeepSeek's open-source approach is super cool! It's wild to see a Chinese AI shaking up the game like this. I wonder how it'll stack up against ChatGPT in real-world coding tasks. Excited to try it out! 😄
DeepSeek's R1 and V3 are pretty cool, but let's be real, they're not perfect. The coding skills are decent, but sometimes it feels like they're just guessing. Still, it's refreshing to see a new player from China in the AI space! Keep improving, DeepSeek! 👏
Introducing DeepSeek: A New Player in the AI Arena
DeepSeek burst onto the scene over the weekend, capturing attention worldwide for three compelling reasons:
- It's an AI chatbot hailing from China, a notable departure from the usual U.S.-based offerings.
- It's open source, which is a big deal in the tech community.
- It runs on significantly less infrastructure than its heavyweight counterparts, making it an intriguing option for many.
While the U.S. government's scrutiny over TikTok and potential Chinese government involvement in its code has raised eyebrows, DeepSeek's emergence from China naturally draws similar attention. However, we're steering clear of politics here. Instead, let's dive into how DeepSeek V3 and DeepSeek R1 stack up against other AI models in coding tasks.
According to DeepSeek's own guidance:
- Choose V3 for tasks that demand depth and accuracy, like solving complex math problems or generating intricate code.
- Opt for R1 when you need quick, high-volume applications, such as customer support automation or basic text processing.
You can toggle between R1 and V3 using a little button in the chat interface. If it's blue, you're using R1.
So, how did they fare? Both models showed promise but weren't flawless. Let's explore the results.
Test 1: Crafting a WordPress https://img.xix.aiplugin
My first test, inspired by my wife's need for a WordPress https://img.xix.aiplugin to manage an involvement device for her online group, is a classic. The https://img.xix.aiplugin had to accept a list of names, sort them, and ensure duplicates weren't side by side. I've thrown this challenge at numerous AIs, and it's a tough one.
DeepSeek V3 nailed it, creating a user interface and program logic that met the brief perfectly. R1 took a different approach, offering a whopping 4502 words of analysis before sharing the code. The UI was broader, but both the UI and logic worked, so R1 also passed.
So far, both V3 and R1 have passed one out of four tests.
Test 2: Rewriting a String Function
A user had trouble entering dollars and cents into a donation field, which my original code didn't allow. The task was to modify the routine to accept both. DeepSeek did generate functional code, but there's room for improvement.
V3's code was overly long and repetitive, while R1's reasoning before generating the code was also lengthy. Both models validated up to two decimal places, but they didn't handle very large numbers well. R1's use of JavaScript's Number conversion without checking for edge cases could lead to crashes.
Interestingly, R1 did provide a nice list of test cases:
I'm giving the point to V3 because its code wouldn't crash and would produce the expected results. R1 fails due to potential crashes from non-string inputs. That's two wins out of four for V3 and one for R1.
Test 3: Tracking Down a Pesky Bug
This test stemmed from a bug I struggled to find. The challenge was that the obvious answer based on the error message was wrong, which often tricks AIs. Solving it requires understanding WordPress API calls, seeing beyond the error message, and pinpointing the bug.
Both V3 and R1 passed this test with nearly identical answers, bringing V3 to three out of four wins and R1 to two out of four. DeepSeek is already outperforming Gemini, Copilot, Claude, and Meta.
Test 4: Crafting a Script
This test is tough because it involves three environments: AppleScript, the Chrome object model, and Keyboard Maestro. ChatGPT aced it, but DeepSeek V3 and R1 fell short. Neither model understood the need to split tasks between Keyboard Maestro and Chrome, and their AppleScript knowledge was weak.
R1 made incorrect assumptions, like assuming a front window always exists and that the front-running program would always be Chrome. This left V3 with three correct tests and one fail, and R1 with two correct tests and two fails.
Final Thoughts
DeepSeek's insistence on using a public cloud email like Gmail rather than my corporate domain was frustrating. There were also some responsiveness issues that made testing take longer than expected.
I initially struggled to sign up due to this error:
DeepSeek's online services have recently faced large-scale malicious attacks. To ensure continued service, registration is temporarily limited to +86 phone numbers. Existing users can log in as usual. Thanks for your understanding and support.
Once in, I was able to run the tests. DeepSeek tends to be verbose with its code. The AppleScript in Test 4 was both incorrect and unnecessarily long. The regular expression in Test 2 could have been more maintainable, though V3 got it right.
I'm impressed that V3 beat out Gemini, Copilot, and Meta, but it's still at the old GPT-3.5 level, suggesting there's room for growth. R1's performance was disappointing. Given the choice, I'd stick with ChatGPT for programming help.
That said, for a new tool running on much less infrastructure, DeepSeek is definitely one to keep an eye on.
What are your thoughts? Have you tried DeepSeek? Do you use any AIs for programming support? Let us know in the comments below.
Follow my daily project updates on social media, subscribe to my weekly newsletter, and connect with me on Twitter/X at @DavidGewirtz, Facebook at Facebook.com/DavidGewirtz, Instagram at Instagram.com/DavidGewirtz, Bluesky at @DavidGewirtz.com, and YouTube at YouTube.com/DavidGewirtzTV.
OpenAI Secretly Changes Charter to Make Removing Altman Harder
Following the 2023 coup-like incident, OpenAI has further solidified protections for CEO Sam Altman by updating its corporate bylaws. Recently released court documents reveal that Altman's position is now rock-solid, with substantially higher barrier
Meta AI now responds to buyer messages on Facebook Marketplace
Facebook Marketplace introduces new Meta AI features, including automated replies to buyer inquiries, the company announced Thursday. The platform also leverages AI to accelerate item listings, summarize seller profiles, and now lets sellers offer sh
OpenAI outlines AI economy with public wealth funds, robot taxes, and four-day week
As governments struggle to manage the economic impact of superintelligent machines, OpenAI has released a set of policy proposals outlining how wealth and work could be reshaped in an "intelligence age." The ideas blend traditional left-leaning mecha
Als Entwickler finde ich es super, dass jetzt auch China mit DeepSeek in den Open-Source-AI-Markt einsteigt. Die Coding-Tests klingen vielversprechend – vielleicht wird die Konkurrenz zwischen den Modellen ja endlich mal die Preise drücken. Hoffentlich bleibt das Projekt langfristig unabhängig und wird nicht von irgendwelchen Firmen vereinnahmt. 🤔
DeepSeek's open-source approach is a game-changer! I'm stoked to see a Chinese AI shaking things up. The coding skills are solid, but I wonder how it’ll stack against giants like GPT in the long run. Exciting times! 🚀
DeepSeek's open-source approach is super cool! It's wild to see a Chinese AI shaking up the game like this. I wonder how it'll stack up against ChatGPT in real-world coding tasks. Excited to try it out! 😄
DeepSeek's R1 and V3 are pretty cool, but let's be real, they're not perfect. The coding skills are decent, but sometimes it feels like they're just guessing. Still, it's refreshing to see a new player from China in the AI space! Keep improving, DeepSeek! 👏





Home






