Daya Guo

DeepSeek-V3-0324

DeepSeek-V3 outperforms other open-source models such as Qwen2.5-72B and Llama-3.1-405B in multiple evaluations and matches the performance of top-tier closed-source models like GPT-4 and Claude-3.5-Sonnet.

iFlytek-Spark-X1-0720

The inference model Spark X1 released by iFlytek, on the basis of leading domestic mathematical tasks, benchmarks the performance of general tasks such as inference, text generation, and language understanding against OpenAI o series and DeepSeek R1.

DeepSeek-R1-0528

The latest version of Deepseek R1.

DeepSeek-V2-Chat-0628

DeepSeek-V2 is a strong Mixture-of-Experts (MoE) language model characterized by economical training and efficient inference. It comprises 236B total parameters, of which 21B are activated for each token. Compared with DeepSeek 67B, DeepSeek-V2 achieves stronger performance, and meanwhile saves 42.5% of training costs, reduces the KV cache by 93.3%, and boosts the maximum generation throughput to 5.76 times.

Spark-X1

The inference model Spark X1 released by iFlytek, on the basis of leading domestic mathematical tasks, benchmarks the performance of general tasks such as inference, text generation, and language understanding against OpenAI o1 and DeepSeek R1.

DeepSeek-V2.5

DeepSeek-V2.5 is an upgraded version that combines DeepSeek-V2-Chat and DeepSeek-Coder-V2-Instruct. The new model integrates the general and coding abilities of the two previous versions.

DeepSeek-V3-0324

DeepSeek-V3 outperforms other open-source models such as Qwen2.5-72B and Llama-3.1-405B in multiple evaluations and matches the performance of top-tier closed-source models like GPT-4 and Claude-3.5-Sonnet.

DeepSeek-V2-Lite-Chat

DeepSeek-V2, a strong Mixture-of-Experts (MoE) language model presented by DeepSeek, DeepSeek-V2-Lite is a lite version of it.

DeepSeek-V2-Chat

DeepSeek-V2 is a strong Mixture-of-Experts (MoE) language model characterized by economical training and efficient inference. It comprises 236B total parameters, of which 21B are activated for each token. Compared with DeepSeek 67B, DeepSeek-V2 achieves stronger performance, and meanwhile saves 42.5% of training costs, reduces the KV cache by 93.3%, and boosts the maximum generation throughput to 5.76 times.

DeepSeek-R1

DeepSeek-R1 is a model trained through large-scale Reinforcement Learning (RL) without using Supervised Fine-Tuning (SFT) as an initial step. Its performance in mathematics, coding, and reasoning tasks is comparable to that of OpenAI-o1.

DeepSeek-V2.5

DeepSeek-V2.5 is an upgraded version that combines DeepSeek-V2-Chat and DeepSeek-Coder-V2-Instruct. The new model integrates the general and coding abilities of the two previous versions.

DeepSeek-V3

DeepSeek-V3 has achieved higher evaluation scores than other open-source models such as Qwen2.5-72B and Llama-3.1-405B, and its performance is on par with the world's top closed-source models like GPT-4o and Claude-3.5-Sonnet.

DeepSeek-R1

DeepSeek-R1 extensively utilized reinforcement learning techniques during the post-training phase, significantly enhancing the model's reasoning capabilities with only a minimal amount of annotated data. On tasks involving mathematics, coding, and natural language inference, its performance is on par with the official release of OpenAI's o1.

DeepSeek-V2-Lite-Chat

DeepSeek-V2, a strong Mixture-of-Experts (MoE) language model presented by DeepSeek, DeepSeek-V2-Lite is a lite version of it.