DeepSeek's R1 and V3 Coding Skills Tested: We're Not Doomed Yet
Introducing DeepSeek: A New Player in the AI Arena
DeepSeek burst onto the scene over the weekend, capturing attention worldwide for three compelling reasons:
- It's an AI chatbot hailing from China, a notable departure from the usual U.S.-based offerings.
- It's open source, which is a big deal in the tech community.
- It runs on significantly less infrastructure than its heavyweight counterparts, making it an intriguing option for many.
While the U.S. government's scrutiny over TikTok and potential Chinese government involvement in its code has raised eyebrows, DeepSeek's emergence from China naturally draws similar attention. However, we're steering clear of politics here. Instead, let's dive into how DeepSeek V3 and DeepSeek R1 stack up against other AI models in coding tasks.
According to DeepSeek's own guidance:
- Choose V3 for tasks that demand depth and accuracy, like solving complex math problems or generating intricate code.
- Opt for R1 when you need quick, high-volume applications, such as customer support automation or basic text processing.
You can toggle between R1 and V3 using a little button in the chat interface. If it's blue, you're using R1.

Screenshot by David Gewirtz/ZDNET
So, how did they fare? Both models showed promise but weren't flawless. Let's explore the results.
Test 1: Crafting a WordPress https://img.xix.aiplugin
My first test, inspired by my wife's need for a WordPress https://img.xix.aiplugin to manage an involvement device for her online group, is a classic. The https://img.xix.aiplugin had to accept a list of names, sort them, and ensure duplicates weren't side by side. I've thrown this challenge at numerous AIs, and it's a tough one.

Screenshot by David Gewirtz/ZDNET
DeepSeek V3 nailed it, creating a user interface and program logic that met the brief perfectly. R1 took a different approach, offering a whopping 4502 words of analysis before sharing the code. The UI was broader, but both the UI and logic worked, so R1 also passed.

Screenshot by David Gewirtz/ZDNET

Screenshot by David Gewirtz/ZDNET
So far, both V3 and R1 have passed one out of four tests.
Test 2: Rewriting a String Function
A user had trouble entering dollars and cents into a donation field, which my original code didn't allow. The task was to modify the routine to accept both. DeepSeek did generate functional code, but there's room for improvement.
V3's code was overly long and repetitive, while R1's reasoning before generating the code was also lengthy. Both models validated up to two decimal places, but they didn't handle very large numbers well. R1's use of JavaScript's Number conversion without checking for edge cases could lead to crashes.
Interestingly, R1 did provide a nice list of test cases:

Screenshot by David Gewirtz/ZDNET
I'm giving the point to V3 because its code wouldn't crash and would produce the expected results. R1 fails due to potential crashes from non-string inputs. That's two wins out of four for V3 and one for R1.
Test 3: Tracking Down a Pesky Bug
This test stemmed from a bug I struggled to find. The challenge was that the obvious answer based on the error message was wrong, which often tricks AIs. Solving it requires understanding WordPress API calls, seeing beyond the error message, and pinpointing the bug.
Both V3 and R1 passed this test with nearly identical answers, bringing V3 to three out of four wins and R1 to two out of four. DeepSeek is already outperforming Gemini, Copilot, Claude, and Meta.
Test 4: Crafting a Script
This test is tough because it involves three environments: AppleScript, the Chrome object model, and Keyboard Maestro. ChatGPT aced it, but DeepSeek V3 and R1 fell short. Neither model understood the need to split tasks between Keyboard Maestro and Chrome, and their AppleScript knowledge was weak.
R1 made incorrect assumptions, like assuming a front window always exists and that the front-running program would always be Chrome. This left V3 with three correct tests and one fail, and R1 with two correct tests and two fails.
Final Thoughts
DeepSeek's insistence on using a public cloud email like Gmail rather than my corporate domain was frustrating. There were also some responsiveness issues that made testing take longer than expected.
I initially struggled to sign up due to this error:
DeepSeek's online services have recently faced large-scale malicious attacks. To ensure continued service, registration is temporarily limited to +86 phone numbers. Existing users can log in as usual. Thanks for your understanding and support.
Once in, I was able to run the tests. DeepSeek tends to be verbose with its code. The AppleScript in Test 4 was both incorrect and unnecessarily long. The regular expression in Test 2 could have been more maintainable, though V3 got it right.
I'm impressed that V3 beat out Gemini, Copilot, and Meta, but it's still at the old GPT-3.5 level, suggesting there's room for growth. R1's performance was disappointing. Given the choice, I'd stick with ChatGPT for programming help.
That said, for a new tool running on much less infrastructure, DeepSeek is definitely one to keep an eye on.
What are your thoughts? Have you tried DeepSeek? Do you use any AIs for programming support? Let us know in the comments below.
Follow my daily project updates on social media, subscribe to my weekly newsletter, and connect with me on Twitter/X at @DavidGewirtz, Facebook at Facebook.com/DavidGewirtz, Instagram at Instagram.com/DavidGewirtz, Bluesky at @DavidGewirtz.com, and YouTube at YouTube.com/DavidGewirtzTV.
Related article
Elevate Your Images with HitPaw AI Photo Enhancer: A Comprehensive Guide
Want to transform your photo editing experience? Thanks to cutting-edge artificial intelligence, improving your images is now effortless. This detailed guide explores the HitPaw AI Photo Enhancer, an
AI-Powered Music Creation: Craft Songs and Videos Effortlessly
Music creation can be complex, demanding time, resources, and expertise. Artificial intelligence has transformed this process, making it simple and accessible. This guide highlights how AI enables any
Creating AI-Powered Coloring Books: A Comprehensive Guide
Designing coloring books is a rewarding pursuit, combining artistic expression with calming experiences for users. Yet, the process can be labor-intensive. Thankfully, AI tools simplify the creation o
Comments (11)
0/200
JoseGonzalez
August 7, 2025 at 2:33:00 AM EDT
DeepSeek's open-source approach is super cool! It's wild to see a Chinese AI shaking up the game like this. I wonder how it'll stack up against ChatGPT in real-world coding tasks. Excited to try it out! 😄
0
ArthurSanchez
April 23, 2025 at 4:48:34 AM EDT
DeepSeek's R1 and V3 are pretty cool, but let's be real, they're not perfect. The coding skills are decent, but sometimes it feels like they're just guessing. Still, it's refreshing to see a new player from China in the AI space! Keep improving, DeepSeek! 👏
0
NicholasAdams
April 23, 2025 at 2:36:41 AM EDT
DeepSeekのR1とV3はかなりクールですが、正直に言うと、完璧ではありません。コーディングのスキルはまあまあですが、時々ただ推測しているように感じます。それでも、中国からAIの新しいプレイヤーが登場するのは新鮮ですね!DeepSeek、改善を続けてください!👏
0
StephenGonzalez
April 21, 2025 at 12:47:37 AM EDT
DeepSeek's R1 and V3 are pretty cool, but they're not perfect. The coding skills are decent, but sometimes the responses are a bit off. Still, it's great to see a new player from China in the AI game. Keep improving, DeepSeek! 👀
0
BruceClark
April 20, 2025 at 2:54:30 PM EDT
ディープシークのR1とV3はかなりクールですが、完璧ではありません。コーディングのスキルはまあまあですが、時々レスポンスがずれることがあります。それでも、中国から新しいプレイヤーがAIの世界に参入するのは素晴らしいです。ディープシーク、改善を続けてください!👀
0
AnthonyHernández
April 20, 2025 at 5:41:17 AM EDT
딥시크의 R1과 V3는 꽤 멋지지만, 완벽하진 않아요. 코딩 스킬은 괜찮은데, 가끔 응답이 좀 어긋나요. 그래도 중국에서 새로운 플레이어가 AI 게임에 참여하는 건 멋진 일이에요. 딥시크, 계속 개선하세요! 👀
0
Introducing DeepSeek: A New Player in the AI Arena
DeepSeek burst onto the scene over the weekend, capturing attention worldwide for three compelling reasons:
- It's an AI chatbot hailing from China, a notable departure from the usual U.S.-based offerings.
- It's open source, which is a big deal in the tech community.
- It runs on significantly less infrastructure than its heavyweight counterparts, making it an intriguing option for many.
While the U.S. government's scrutiny over TikTok and potential Chinese government involvement in its code has raised eyebrows, DeepSeek's emergence from China naturally draws similar attention. However, we're steering clear of politics here. Instead, let's dive into how DeepSeek V3 and DeepSeek R1 stack up against other AI models in coding tasks.
According to DeepSeek's own guidance:
- Choose V3 for tasks that demand depth and accuracy, like solving complex math problems or generating intricate code.
- Opt for R1 when you need quick, high-volume applications, such as customer support automation or basic text processing.
You can toggle between R1 and V3 using a little button in the chat interface. If it's blue, you're using R1.
So, how did they fare? Both models showed promise but weren't flawless. Let's explore the results.
Test 1: Crafting a WordPress https://img.xix.aiplugin
My first test, inspired by my wife's need for a WordPress https://img.xix.aiplugin to manage an involvement device for her online group, is a classic. The https://img.xix.aiplugin had to accept a list of names, sort them, and ensure duplicates weren't side by side. I've thrown this challenge at numerous AIs, and it's a tough one.
DeepSeek V3 nailed it, creating a user interface and program logic that met the brief perfectly. R1 took a different approach, offering a whopping 4502 words of analysis before sharing the code. The UI was broader, but both the UI and logic worked, so R1 also passed.
So far, both V3 and R1 have passed one out of four tests.
Test 2: Rewriting a String Function
A user had trouble entering dollars and cents into a donation field, which my original code didn't allow. The task was to modify the routine to accept both. DeepSeek did generate functional code, but there's room for improvement.
V3's code was overly long and repetitive, while R1's reasoning before generating the code was also lengthy. Both models validated up to two decimal places, but they didn't handle very large numbers well. R1's use of JavaScript's Number conversion without checking for edge cases could lead to crashes.
Interestingly, R1 did provide a nice list of test cases:
I'm giving the point to V3 because its code wouldn't crash and would produce the expected results. R1 fails due to potential crashes from non-string inputs. That's two wins out of four for V3 and one for R1.
Test 3: Tracking Down a Pesky Bug
This test stemmed from a bug I struggled to find. The challenge was that the obvious answer based on the error message was wrong, which often tricks AIs. Solving it requires understanding WordPress API calls, seeing beyond the error message, and pinpointing the bug.
Both V3 and R1 passed this test with nearly identical answers, bringing V3 to three out of four wins and R1 to two out of four. DeepSeek is already outperforming Gemini, Copilot, Claude, and Meta.
Test 4: Crafting a Script
This test is tough because it involves three environments: AppleScript, the Chrome object model, and Keyboard Maestro. ChatGPT aced it, but DeepSeek V3 and R1 fell short. Neither model understood the need to split tasks between Keyboard Maestro and Chrome, and their AppleScript knowledge was weak.
R1 made incorrect assumptions, like assuming a front window always exists and that the front-running program would always be Chrome. This left V3 with three correct tests and one fail, and R1 with two correct tests and two fails.
Final Thoughts
DeepSeek's insistence on using a public cloud email like Gmail rather than my corporate domain was frustrating. There were also some responsiveness issues that made testing take longer than expected.
I initially struggled to sign up due to this error:
DeepSeek's online services have recently faced large-scale malicious attacks. To ensure continued service, registration is temporarily limited to +86 phone numbers. Existing users can log in as usual. Thanks for your understanding and support.
Once in, I was able to run the tests. DeepSeek tends to be verbose with its code. The AppleScript in Test 4 was both incorrect and unnecessarily long. The regular expression in Test 2 could have been more maintainable, though V3 got it right.
I'm impressed that V3 beat out Gemini, Copilot, and Meta, but it's still at the old GPT-3.5 level, suggesting there's room for growth. R1's performance was disappointing. Given the choice, I'd stick with ChatGPT for programming help.
That said, for a new tool running on much less infrastructure, DeepSeek is definitely one to keep an eye on.
What are your thoughts? Have you tried DeepSeek? Do you use any AIs for programming support? Let us know in the comments below.
Follow my daily project updates on social media, subscribe to my weekly newsletter, and connect with me on Twitter/X at @DavidGewirtz, Facebook at Facebook.com/DavidGewirtz, Instagram at Instagram.com/DavidGewirtz, Bluesky at @DavidGewirtz.com, and YouTube at YouTube.com/DavidGewirtzTV.




DeepSeek's open-source approach is super cool! It's wild to see a Chinese AI shaking up the game like this. I wonder how it'll stack up against ChatGPT in real-world coding tasks. Excited to try it out! 😄




DeepSeek's R1 and V3 are pretty cool, but let's be real, they're not perfect. The coding skills are decent, but sometimes it feels like they're just guessing. Still, it's refreshing to see a new player from China in the AI space! Keep improving, DeepSeek! 👏




DeepSeekのR1とV3はかなりクールですが、正直に言うと、完璧ではありません。コーディングのスキルはまあまあですが、時々ただ推測しているように感じます。それでも、中国からAIの新しいプレイヤーが登場するのは新鮮ですね!DeepSeek、改善を続けてください!👏




DeepSeek's R1 and V3 are pretty cool, but they're not perfect. The coding skills are decent, but sometimes the responses are a bit off. Still, it's great to see a new player from China in the AI game. Keep improving, DeepSeek! 👀




ディープシークのR1とV3はかなりクールですが、完璧ではありません。コーディングのスキルはまあまあですが、時々レスポンスがずれることがあります。それでも、中国から新しいプレイヤーがAIの世界に参入するのは素晴らしいです。ディープシーク、改善を続けてください!👀




딥시크의 R1과 V3는 꽤 멋지지만, 완벽하진 않아요. 코딩 스킬은 괜찮은데, 가끔 응답이 좀 어긋나요. 그래도 중국에서 새로운 플레이어가 AI 게임에 참여하는 건 멋진 일이에요. 딥시크, 계속 개선하세요! 👀












