DeepSeek's R1 and V3 Coding Skills Tested: We're Not Doomed Yet

Home

News

April 17, 2025

MarkSmith

117

Introducing DeepSeek: A New Player in the AI Arena

DeepSeek burst onto the scene over the weekend, capturing attention worldwide for three compelling reasons:

It's an AI chatbot hailing from China, a notable departure from the usual U.S.-based offerings.
It's open source, which is a big deal in the tech community.
It runs on significantly less infrastructure than its heavyweight counterparts, making it an intriguing option for many.

While the U.S. government's scrutiny over TikTok and potential Chinese government involvement in its code has raised eyebrows, DeepSeek's emergence from China naturally draws similar attention. However, we're steering clear of politics here. Instead, let's dive into how DeepSeek V3 and DeepSeek R1 stack up against other AI models in coding tasks.

According to DeepSeek's own guidance:

Choose V3 for tasks that demand depth and accuracy, like solving complex math problems or generating intricate code.
Opt for R1 when you need quick, high-volume applications, such as customer support automation or basic text processing.

You can toggle between R1 and V3 using a little button in the chat interface. If it's blue, you're using R1.

Screenshot by David Gewirtz/ZDNET

So, how did they fare? Both models showed promise but weren't flawless. Let's explore the results.

Test 1: Crafting a WordPress https://img.xix.aiplugin

My first test, inspired by my wife's need for a WordPress https://img.xix.aiplugin to manage an involvement device for her online group, is a classic. The https://img.xix.aiplugin had to accept a list of names, sort them, and ensure duplicates weren't side by side. I've thrown this challenge at numerous AIs, and it's a tough one.

Screenshot by David Gewirtz/ZDNET

DeepSeek V3 nailed it, creating a user interface and program logic that met the brief perfectly. R1 took a different approach, offering a whopping 4502 words of analysis before sharing the code. The UI was broader, but both the UI and logic worked, so R1 also passed.

Screenshot by David Gewirtz/ZDNET

So far, both V3 and R1 have passed one out of four tests.

Test 2: Rewriting a String Function

A user had trouble entering dollars and cents into a donation field, which my original code didn't allow. The task was to modify the routine to accept both. DeepSeek did generate functional code, but there's room for improvement.

V3's code was overly long and repetitive, while R1's reasoning before generating the code was also lengthy. Both models validated up to two decimal places, but they didn't handle very large numbers well. R1's use of JavaScript's Number conversion without checking for edge cases could lead to crashes.

Interestingly, R1 did provide a nice list of test cases:

Screenshot by David Gewirtz/ZDNET

I'm giving the point to V3 because its code wouldn't crash and would produce the expected results. R1 fails due to potential crashes from non-string inputs. That's two wins out of four for V3 and one for R1.

Test 3: Tracking Down a Pesky Bug

This test stemmed from a bug I struggled to find. The challenge was that the obvious answer based on the error message was wrong, which often tricks AIs. Solving it requires understanding WordPress API calls, seeing beyond the error message, and pinpointing the bug.

Both V3 and R1 passed this test with nearly identical answers, bringing V3 to three out of four wins and R1 to two out of four. DeepSeek is already outperforming Gemini, Copilot, Claude, and Meta.

Test 4: Crafting a Script

This test is tough because it involves three environments: AppleScript, the Chrome object model, and Keyboard Maestro. ChatGPT aced it, but DeepSeek V3 and R1 fell short. Neither model understood the need to split tasks between Keyboard Maestro and Chrome, and their AppleScript knowledge was weak.

R1 made incorrect assumptions, like assuming a front window always exists and that the front-running program would always be Chrome. This left V3 with three correct tests and one fail, and R1 with two correct tests and two fails.

Final Thoughts

DeepSeek's insistence on using a public cloud email like Gmail rather than my corporate domain was frustrating. There were also some responsiveness issues that made testing take longer than expected.

I initially struggled to sign up due to this error:

DeepSeek's online services have recently faced large-scale malicious attacks. To ensure continued service, registration is temporarily limited to +86 phone numbers. Existing users can log in as usual. Thanks for your understanding and support.

Once in, I was able to run the tests. DeepSeek tends to be verbose with its code. The AppleScript in Test 4 was both incorrect and unnecessarily long. The regular expression in Test 2 could have been more maintainable, though V3 got it right.

I'm impressed that V3 beat out Gemini, Copilot, and Meta, but it's still at the old GPT-3.5 level, suggesting there's room for growth. R1's performance was disappointing. Given the choice, I'd stick with ChatGPT for programming help.

That said, for a new tool running on much less infrastructure, DeepSeek is definitely one to keep an eye on.

What are your thoughts? Have you tried DeepSeek? Do you use any AIs for programming support? Let us know in the comments below.

Follow my daily project updates on social media, subscribe to my weekly newsletter, and connect with me on Twitter/X at @DavidGewirtz, Facebook at Facebook.com/DavidGewirtz, Instagram at Instagram.com/DavidGewirtz, Bluesky at @DavidGewirtz.com, and YouTube at YouTube.com/DavidGewirtzTV.

Best AI Tools for Creating Educational Infographics – Design Tips & Techniques In today's digitally-driven educational landscape, infographics have emerged as a transformative communication medium that converts complex information into visually appealing, easily understandable formats. AI technology is revolutionizing how educa

Topaz DeNoise AI: Best Noise Reduction Tool in 2025 – Full Guide In the competitive world of digital photography, image clarity remains paramount. Photographers at all skill levels contend with digital noise that compromises otherwise excellent shots. Topaz DeNoise AI emerges as a cutting-edge solution, harnessing

Master Emerald Kaizo Nuzlocke: Ultimate Survival & Strategy Guide Emerald Kaizo stands as one of the most formidable Pokémon ROM hacks ever conceived. While attempting a Nuzlocke run exponentially increases the challenge, victory remains achievable through meticulous planning and strategic execution. This definitiv

Comments (13)

0/200

Submit

CarlCarter

September 5, 2025 at 4:30:30 PM EDT

DeepSeek这波操作有点东西啊！中国本土AI终于不再只擅长写诗和做饭了，居然在代码能力上也能和国外大模型掰手腕👏 不过开源这事...希望别过两天就变成'部分开源'吧😂

BruceGonzalez

August 25, 2025 at 3:01:02 AM EDT

DeepSeek's open-source approach is a game-changer! I'm stoked to see a Chinese AI shaking things up. The coding skills are solid, but I wonder how it’ll stack against giants like GPT in the long run. Exciting times! 🚀

JoseGonzalez

August 7, 2025 at 2:33:00 AM EDT

DeepSeek's open-source approach is super cool! It's wild to see a Chinese AI shaking up the game like this. I wonder how it'll stack up against ChatGPT in real-world coding tasks. Excited to try it out! 😄

ArthurSanchez

April 23, 2025 at 4:48:34 AM EDT

DeepSeek's R1 and V3 are pretty cool, but let's be real, they're not perfect. The coding skills are decent, but sometimes it feels like they're just guessing. Still, it's refreshing to see a new player from China in the AI space! Keep improving, DeepSeek! 👏

NicholasAdams

April 23, 2025 at 2:36:41 AM EDT

DeepSeekのR1とV3はかなりクールですが、正直に言うと、完璧ではありません。コーディングのスキルはまあまあですが、時々ただ推測しているように感じます。それでも、中国からAIの新しいプレイヤーが登場するのは新鮮ですね！DeepSeek、改善を続けてください！👏

StephenGonzalez

April 21, 2025 at 12:47:37 AM EDT

DeepSeek's R1 and V3 are pretty cool, but they're not perfect. The coding skills are decent, but sometimes the responses are a bit off. Still, it's great to see a new player from China in the AI game. Keep improving, DeepSeek! 👀