DeepSeek R1 Just Got a HUGE Update! (o3 Level Model)

DeepSeek R1 Just Got a HUGE Update! (o3 Level Model)

In the rapidly evolving landscape of artificial intelligence, breakthroughs happen almost daily. One of the most exciting recent developments comes from DeepSeek, a Chinese AI lab that has just released a significant update to their open-source model, DeepSeek R1. Matthew Berman, a respected AI commentator, dives deep into this update and reveals how DeepSeek R1 now rivals some of the most powerful closed-source models from leading US tech companies like OpenAI and Google. This article explores the details of this update, why it matters, and what it means for the future of AI development.

Table of Contents

🚀 What’s New with DeepSeek R1?

DeepSeek surprised the AI community when they quietly dropped the weights for DeepSeek R1 0528 on Hugging Face without any accompanying information. However, Matthew Berman uncovered that this “minor upgrade” is actually a substantial leap forward. The latest version of DeepSeek R1 significantly improves its depth of reasoning and inference capabilities by leveraging increased computational resources and introducing algorithmic optimization mechanisms during post training.

One of the key points Matthew highlights is that this update did not involve a change in the model’s architecture. Instead, DeepSeek was able to extract more intelligence from the same base model through refined reinforcement learning techniques applied after the initial pre-training phase. This post-training refinement has led to a dramatic boost in the model’s capabilities without the need to train a new architecture from scratch.

DeepSeek R1 now boasts 671 billion parameters, with 37 billion active parameters during inference. This scale, combined with the improved training methodology, allows it to perform exceptionally well across various benchmark tests, including mathematics, programming, and logic.

📊 Benchmark Breakdowns: DeepSeek R1 vs. the Competition

To truly appreciate the significance of this update, let’s look at the benchmark improvements and how DeepSeek R1 compares with other leading models like OpenAI’s GPT-3 (referred to as o3 in the video) and Google’s Gemini 2.5 Pro.

  • AMI 2024 Benchmark: DeepSeek’s score jumped from 79.8 to 91.4.
  • AMI 2025 Benchmark: Improved from 70 to 87.
  • GPQA Diamond: Increased from 71 to 81.
  • Live CodeBench: Rose from 63 to 73.
  • Eider: Made a significant leap from 57 to 71.
  • Humanity’s Last Exam: Doubled from 8.5 to 17.7.

Comparing these numbers with OpenAI’s GPT-3, DeepSeek R1 is now almost neck and neck. It matches GPT-3 on AMI 2024 and is only slightly behind on other benchmarks like AMI 2025, GPQA Diamond, and Live CodeBench. Interestingly, Gemini 2.5 Pro, which many consider the best coding model available, falls behind GPT-3 on almost every benchmark.

Matthew also points out that DeepSeek R1 has made a significant leap in coding skills, now matching Gemini 2.5 Pro in the Artificial Analysis coding index and only trailing behind newer OpenAI models like GPT-4 Mini High and GPT-3.

🤖 The Importance of Token Usage and “Thinking” Time

A fascinating insight from Matthew’s analysis is the amount of tokens the model uses during its reasoning process. The updated DeepSeek R1 0528 uses approximately 99 million tokens to complete evaluations in the Artificial Analysis Intelligence Index—40% more than the original version. This means the model is “thinking” longer and more thoroughly before producing an answer.

For comparison, Gemini 2.5 Pro uses even more tokens—about 30% more than DeepSeek R1 0528—indicating that it spends more time processing and reasoning. This token usage correlates with deeper, more complex reasoning and better performance on challenging tasks.

🌍 The Shrinking Gap Between Open Source and Closed Source AI

One of the most exciting takeaways from this update is how it demonstrates the rapidly shrinking gap between open-source AI models and their closed-source counterparts. When DeepSeek R1 was first released, it was already a major leap forward in open-source AI, offering a highly capable and efficient model that could compete with closed models on many fronts.

Now, with this update, DeepSeek R1 not only competes but in some benchmarks exceeds the performance of leading US-based closed-source models from Anthropic and Meta. According to Artificial Analysis, DeepSeek R1 is tied as the world’s number two AI lab and the undisputed leader in open weights models.

This is a huge milestone because it proves that open-source projects can keep pace with the biggest players in AI research and development. It also points to a more democratized future in AI, where powerful tools are accessible to a wider community rather than locked behind corporate walls.

🇨🇳 China’s Growing AI Powerhouse

Matthew also highlights the geopolitical implications of this update. Artificial Analysis shows that Chinese AI models are now essentially neck and neck with US models in terms of capability. DeepSeek’s progress exemplifies this trend, signaling that China is not only catching up but in some areas, leading AI innovation.

This competitive dynamic between China and the US is driving rapid advancements on both sides, benefiting the global AI ecosystem with better models and more innovation. DeepSeek’s update is an important milestone in this ongoing race.

🧠 DeepSeek R1’s Post-Training Optimization: How It Works

The secret sauce behind DeepSeek R1’s leap in performance lies in its post-training optimization. Unlike training a new model from scratch, DeepSeek focused on refining the existing model through enhanced reinforcement learning techniques. This process involved:

  1. Leveraging increased computational power: Using more resources to fine-tune the model’s reasoning and inference abilities.
  2. Algorithmic optimizations: Introducing new optimization mechanisms that improve how the model processes information during inference.
  3. Extended “thinking” time: Allowing the model to consider more tokens and deeper chains of thought before answering.

This approach allowed DeepSeek to extract significantly more intelligence from the same architecture, effectively pushing the model’s boundaries without the need for a complete overhaul.

🕹️ Real-World Tests: Rubik’s Cube and Advanced Snake Game Challenges

To put DeepSeek R1’s capabilities to the test, Matthew ran two demanding programming challenges: a fully interactive Rubik’s Cube simulation and an advanced snake game with complex mechanics like multiple food types, power-ups, and teleportation.

Rubik’s Cube Challenge

The prompt was to write a complete HTML and JavaScript program using Three.js to render a Rubik’s Cube of any size up to 20x20x20, with dynamic user input for cube size, proper color-coded faces, camera controls, and user interaction to rotate layers via mouse or UI buttons.

Gemini 2.5 Pro nailed this challenge on the first try, producing a complex and functional program in just a few minutes. The new DeepSeek R1 also performed impressively, outputting a substantial amount of code and demonstrating deep reasoning about the problem. However, Matthew noted some issues with the cube’s rotation physics and scrambling behavior, which didn’t work perfectly.

These minor glitches aside, DeepSeek R1’s performance was on par with Gemini 2.5 Pro in terms of code complexity and depth of thought, which is remarkable given the difficulty of the task.

Advanced Snake Game Challenge

For the snake game, the goal was to create a sophisticated version featuring various food types, power-ups, teleportation, and other advanced mechanics. DeepSeek R1 started generating code quickly, showing its ability to handle complex programming tasks.

While Matthew’s detailed results on this challenge were not fully covered, the fact that DeepSeek R1 could even begin to tackle such a multi-faceted game is a testament to its enhanced coding capabilities.

📈 What Does This Mean for the Future of AI?

DeepSeek R1’s update signals a number of important trends and implications for AI enthusiasts, developers, and industry watchers:

  • Open-source AI is catching up fast: The gap between open and closed models is narrowing, making powerful AI tools more accessible to everyone.
  • Post-training optimization is a game-changer: Extracting more intelligence from existing architectures can yield massive gains without costly retraining.
  • Global AI competition drives innovation: Chinese AI labs like DeepSeek are pushing the envelope, challenging US dominance and accelerating progress worldwide.
  • Token usage and “thinking” time matter: Models that can process more tokens during reasoning tend to perform better on complex tasks.
  • Real-world coding challenges are key benchmarks: Tasks like simulating a Rubik’s Cube or building advanced games showcase practical AI programming capabilities.

For anyone interested in AI development, DeepSeek R1’s progress is an exciting development that suggests we will soon see even more powerful and versatile open-source models capable of competing with the best in the business.

❓ FAQ: DeepSeek R1 Update Explained

What is DeepSeek R1?

DeepSeek R1 is a large-scale open-source AI language model developed by the Chinese AI lab DeepSeek. It has 671 billion parameters and is designed for advanced reasoning, programming, and general AI tasks.

How does the latest update improve DeepSeek R1?

The update leverages increased computational resources and introduces algorithmic optimizations during post-training, allowing the model to “think” longer and perform better on benchmarks without changing its architecture.

How does DeepSeek R1 compare to OpenAI’s GPT-3 and Gemini 2.5 Pro?

DeepSeek R1 now performs almost on par with GPT-3 across multiple benchmarks and matches Gemini 2.5 Pro in coding tasks. It even leads some open-source models and outperforms certain closed-source models from Anthropic and Meta.

What benchmarks were used to measure DeepSeek R1’s performance?

Benchmarks include AMI 2024 and 2025, GPQA Diamond, Live CodeBench, Eider, and Humanity’s Last Exam, covering a range of reasoning, math, logic, and programming challenges.

What is the significance of token usage in AI models?

Token usage reflects how much “thinking” or internal processing a model performs before generating an output. More token consumption usually indicates deeper reasoning, which often corresponds to better performance.

Is DeepSeek R1 truly open source?

Yes, DeepSeek R1 is completely free and open source, making it accessible to developers and researchers worldwide without licensing restrictions.

What are some real-world applications of DeepSeek R1?

DeepSeek R1 can be used for advanced coding tasks, math problem solving, logical reasoning, and potentially many other AI-driven applications that require deep understanding and inference.

Where can I access DeepSeek R1?

The model weights are available on Hugging Face, a popular platform for sharing AI models. Developers can download and experiment with the model freely.

🔗 Final Thoughts

DeepSeek R1’s recent update is a landmark moment in the AI world. By pushing the limits of an existing architecture through smart post-training optimizations and increased computational muscle, DeepSeek has positioned its open-source model firmly alongside the best closed-source AI systems.

This development not only underscores the rapid progress being made by Chinese AI labs but also highlights the growing strength and viability of open-source AI models. For developers, researchers, and AI enthusiasts, DeepSeek R1 offers a powerful, accessible tool that challenges the notion that cutting-edge AI must be proprietary.

As the AI arms race continues, with models like Grok 3.5 and others on the horizon, it will be fascinating to watch how DeepSeek R1 and similar models evolve. For now, the future looks bright for open-source AI innovation.

To stay updated on developments like this and more, consider following Matthew Berman and signing up for AI newsletters that cover the latest breakthroughs.

 

Leave a Reply

Your email address will not be published. Required fields are marked *

Most Read

Subscribe To Our Magazine

Download Our Magazine