Publisher growth tactics for election season | WEBINAR

Learn More

SODP

SODP Media

  • Insights
    • Articles
      • Audience Development
      • Content Strategy
      • Digital Publishing
      • Monetization
      • SEO
      • Digital Platforms & Tools
    • Top Tools & Reviews
    • Opinion
    • Podcast
  • Education
    • Publisher SEO Course
    • Events
      • Monetization Week 2025
  • Resources
  • Community
    • Slack Channel
    • Newsletter
  • About
    • About Us
    • Contact Us
    • Editorial Policy
  • English
sodp logo
SODP logo
    Search
    Close this search box.
    Login
    • Insights
      • Podcast
      • Articles
        • Audience Development
        • Content Strategy
        • Digital Publishing
        • Monetization
        • SEO
        • Digital Platforms & Tools
      • Top Tools & Reviews
        • Articles
        • Top Tools & Reviews
        • Opinion
        • Podcasts
        • Audience Development
        • Content Strategy
        • Digital Publishing
        • Monetization
        • SEO
        • Digital Platforms & Tools
        • Headless CMS Platforms
        • Digital Publishing Platforms
        • Editorial Calendar Software
        • Magazine Apps
        • Email Newsletter Platforms
        • More Best Tools Lists
    • Education
      • Publisher SEO Course
        • Publisher SEO Course
        • Events
        • Monetization Week 2025
        • View all
    • Resources
    • Community
      • Slack Channel
      • Office Hours
      • Newsletter
        • Slack Channel
        • Newsletter
    • About
      • About Us
      • Contact Us
      • Editorial Policy
        • About Us
        • Contact Us
        • Editorial Policy
    placeholder
    SODP logo
    Become a Brand Partner

    Home ▸ Digital Platforms & Tools ▸ Putting DeepSeek To The Test: How Its Performance Compares Against Other AI Tools

    Putting DeepSeek To The Test: How Its Performance Compares Against Other AI Tools

    Simon ThorneSimon Thorne
    February 5, 2025
    Fact checked by The Conversation
    The Conversation
    The Conversation

    The Conversation is a unique collaboration between academics and journalists that in a decade has become the world’s leading publisher of research-based news and analysis. Everything you read on these pages is … Read more

    Edited by Simon Thorne
    Simon Thorne
    Simon Thorne

    My research is mostly grounded in computer science with applications in information systems. I think of myself as more of a natural scientist preferring scientific inquiry when conducting research. ​My PhD rese…Read more

    DeepSeek

    China’s new DeepSeek Large Language Model (LLM) has disrupted the US-dominated market, offering a relatively high-performance chatbot model at significantly lower cost.

    The reduced cost of development and lower subscription prices compared with US AI tools contributed to American chip maker Nvidia losing US$600 billion (£480 billion) in market value over one day. Nvidia makes the computer chips used to train the majority of LLMs, the underlying technology used in ChatGPT and other AI chatbots. DeepSeek uses cheaper Nvidia H800 chips over the more expensive state-of-the-art versions.

    ChatGPT developer OpenAI reportedly spent somewhere between US$100 million and US$1 billion on the development of a very recent version of its product called o1. In contrast, DeepSeek accomplished its training in just two months at a cost of US$5.6 million using a series of clever innovations.

    But just how well does DeepSeek’s AI chatbot, R1, compare with other, similar AI tools on performance?

    DeepSeek claims its models perform comparably to OpenAI’s offerings, even exceeding the o1 model in certain benchmark tests. However, benchmarks that use Massive Multitask Language Understanding (MMLU) tests evaluate knowledge across multiple subjects using multiple choice questions. Many LLMs are trained and optimised for such tests, making them unreliable as true indicators of real-world performance.

    An alternative methodology for the objective evaluation of LLMs uses a set of tests developed by researchers at Cardiff Metropolitan, Bristol and Cardiff universities – known collectively as the Knowledge Observation Group (KOG). These tests probe LLMs’ ability to mimic human language and knowledge through questions that require implicit human understanding to answer. The core tests are kept secret, to avoid LLM companies training their models for these tests.

    KOG deployed public tests inspired by work by Colin Fraser, a data scientist at Meta, to evaluate DeepSeek against other LLMs. The following results were observed:

    LLM Performance test.
    LLM Performance test.

    The tests used to produce this table are “adversarial” in nature. In other words, they are designed to be “hard” and to test LLMs in way that are not sympathetic to how they are designed. This means the performance of these models in this test is likely to be different to their performance in mainstream benchmarking tests.

    DeepSeek scored 5.5 out of 6, outperforming OpenAI’s o1 – its advanced reasoning (known as “chain-of-thought”) model – as well as ChatGPT-4o, the free version of ChatGPT. But Deepseek was marginally outperformed by Anthropic’s ClaudeAI and OpenAI’s o1 mini, both of which scored a perfect 6/6. It’s interesting that o1 underperformed against its “smaller” counterpart, o1 mini.

    DeepThink R1 – a chain-of-thought AI tool made by DeepSeek – underperformed in comparison to DeepSeek with a score of 3.5.

    This result shows how competitive DeepSeek’s chatbot already is, beating OpenAI’s flagship models. It is likely to spur further development for DeepSeek, which now has a strong foundation to build upon. However, the Chinese tech company does have one serious problem the other LLMs do not: censorship.

    Censorship challenges

    Despite its strong performance and popularity, DeepSeek has faced criticism over its responses to politically sensitive topics in China. For instance, prompts related to Tiananmen Square, Taiwan, Uyghur Muslims and democratic movements are met with the response: “Sorry, that is beyond my current scope.”

    But this issue is not necessarily unique to DeepSeek, and the potential for political influence and censorship in LLMs more generally is a growing concern. The announcement of Donald Trump’s US$500 billion Stargate LLM project, involving OpenAI, Nvidia, Oracle, Microsoft, and Arm, also raises fears of political influence.

    Additionally, Meta’s recent decision to abandon fact-checking on Facebook and Instagram suggests an increasing trend toward populism over truthfulness.

    Content from our partners

    AI, the double-edged sword of creativity, and why publishers must embrace it

    AI, The Double-edged Sword of Creativity, and Why Publishers Must Embrace It

    Why Online Publishers Need a VPN: Protecting Data, Sources, and Revenue

    Why Online Publishers Need a VPN: Protecting Data, Sources, and Revenue

    Audience management’s Goldilocks problem (2)

    Audience management’s Goldilocks problem: how publishers get caught between tech that doesn’t work, or is too complex to use

    DeepSeek’s arrival has caused serious disruption to the LLM market. US companies such as OpenAI and Anthropic will be forced to innovate their products to maintain relevance and match its performance and cost.

    DeepSeek’s success is already challenging the status quo, demonstrating that high-performance LLM models can be developed without billion-dollar budgets. It also highlights the risks of LLM censorship, the spread of misinformation, and why independent evaluations matter.

    As LLMs become more deeply embedded in global politics and business, transparency and accountability will be essential to ensure that the future of LLMs is safe, useful and trustworthy.

    Simon Thorne, Senior Lecturer in Computing and ​Information Systems, Cardiff Metropolitan University

    This article is republished from The Conversation under a Creative Commons license. Read the original article.

    Editor's Picks
    What Is a Content Creator The What, Why and How of the Creator Economy
    Content Strategy

    What Is a Content Creator?

    Best Email Newsletter Platforms for Publishers
    Digital Platforms & Tools

    8 Best Email Newsletter Platforms for Publishers in 2024

    Google News SEO
    SEO

    Google News SEO Guide 2024: Best Practices for News Publishers

    Related Posts

    • Best AI Writing Tools
      10 Best AI Writing Tools
    • Best AI Transcription Tools
      11 Best AI Transcription Tools in 2024
    • AI Tools for Content Creation
      12 Best AI Tools for Content Creation
    • 17 Best Media Monitoring Tools in 2023
      13 Best Media Monitoring Tools in 2025
    SODP logo

    State of Digital Publishing is creating a new publication and community for digital media and publishing professionals, in new media and technology.

    • Top tools
    • SEO for publishers
    • Privacy policy
    • Editorial policy
    • Sitemap
    • Search by company
    Facebook X-twitter Slack Linkedin

    STATE OF DIGITAL PUBLISHING – COPYRIGHT 2025

    2nd Annual

    Monetization Week

    The Convergence of Innovation and Strategy: Publisher Monetization in 2025.

    A 5-day online event exploring the future of publisher revenue models.

    May 19 – 23, 2025

    Online Event

    Learn More