New Horizons for LLMs (Large Language Models)

When DeepSeek launched its R1 model on January 20th, it made a splash. But on January 27th, it unleashed a tsunami—triggering a trading frenzy that wiped $1 trillion off the stock market in a single day . The scale of that impact was unprecedented, sending shockwaves through the financial and tech industries alike. In the days that followed, one question emerged at the centre of every discussion: “What comes next?“

The release of R1 is arguably the most significant milestone in AI since OpenAI introduced ChatGPT in November 2022. DeepSeek has now become the focal point of discussion in AI circles—dominating news cycles, conference panels, and industry think pieces. Everyone wants to understand what this means for the broader AI ecosystem. Is this a turning point? Or just another passing trend?

One thing is clear: the race for AI dominance has never been more intense, both in the free market and geopolitically. Some organizations are already questioning whether they should rip out their current AI stacks—replacing OpenAI’s or Anthropic’s APIs with DeepSeek’s models.

This leads us to the bold claim in our title: Traditional LLMs (Large Language Models) are dead. Hyperbole? Maybe. The models we once considered state-of-the-art are being challenged in ways we haven’t seen before. At the same time, we pose the question: “Long live DeepSeek?” The market is moving at breakneck speed, and while DeepSeek has redefined expectations, don’t count on it replacing OpenAI or Anthropic overnight. What it has done is crack open a new horizon for LLM development, showing a path, that others will now follow.

So, where does this leave us? What does DeepSeek’s rise mean for the future of LLMs? How has it disrupted the AI landscape, and what challenges lie ahead? In this article, we’ll break down how DeepSeek has shaken up the industry, the controversies surrounding it, and what the future holds for large language models.

To fully grasp the significance of DeepSeek’s disruption, it’s essential to understand how traditional LLMs work and the specific innovations that set DeepSeek apart. LLMs have evolved rapidly over the past few years, yet their core development process has remained largely unchanged—until now.

LLMs Over Time

How Traditional LLMs Work

Developing an LLM is a resource-intensive process that requires vast datasets, high-performance computing power, and sophisticated machine learning techniques. The standard pipeline consists of several key stages:

Data Collection – The Foundation of Language Models

Every LLM is only as good as the data it learns from. Training data is gathered from diverse sources—books, academic papers, news articles, code repositories, and user-generated content from forums, blogs, and websites. Some of this data is openly available, while other portions come from licensed or proprietary datasets.

Preprocessing – Cleaning and Structuring Data

Once raw data is collected, it must be cleaned and structured for training. This involves removing inconsistencies, formatting text properly, and handling duplicate or low-quality data. The phrase “garbage in, garbage out” applies here—poor-quality training data results in an unreliable model.

Data is then split into:

Training Data – Used to teach the model patterns in language.

Validation Data – Helps fine-tune hyperparameters and optimize performance.

Test Data – Evaluates how well the model generalizes to unseen examples.

Tokenization – Converting Text into a Machine-Readable Format

Since AI models cannot understand raw text, it must be converted into numerical representations. This is done through tokenization, where text is broken down into smaller units, or “tokens.”

For example, the sentence “The quick brown fox” might be tokenized into numbers:

[5026, 464, 995, 2456, 1121]

These tokens are then fed into a neural network, where the model learns relationships between words, phrases, and sentence structures.

Pretraining – Teaching the Model to Predict Language

At the heart of every LLM is a Transformer architecture, a deep learning framework that has revolutionized natural language processing (NLP). Instead of reading words sequentially like older models, Transformers use self-attention mechanisms, allowing the model to weigh the importance of different words in a sentence regardless of their position.

Example of Self-Attention:

In the sentence “The bank approved the loan”, the word “bank” refers to a financial institution.

In “The boat reached the riverbank”, “bank” has a completely different meaning.

Transformers analyse context, enabling them to determine the correct interpretation.

During pretraining, the model learns to predict missing words based on its context. Given the phrase:

“The cat sat on the [MASK].”

The model might predict “mat” based on prior knowledge of similar phrases in its training data.

This phase requires massive computational power—typically thousands of GPUs or TPUs—making it one of the most expensive steps in LLM development.

Fine-Tuning – Aligning the Model with Human Intentions

Once pretrained, the model understands language, but it still lacks task-specific abilities or human alignment. This is where fine-tuning comes in.

One of the most effective fine-tuning methods is Reinforcement Learning from Human Feedback (RLHF):

Human annotators rank multiple model-generated responses.
A reward model is trained to score responses based on quality and accuracy.
The main LLM is further optimized using reinforcement learning techniques like Proximal Policy Optimization (PPO) to improve future outputs.

This process helps models become more helpful, honest, and safe, but it still requires significant human oversight and labelled data, making it labour-intensive.

Optimization – Reducing Model Size and Speeding Up Performance

Pretrained and fine-tuned LLMs are often too large and inefficient for real-world applications. Optimization techniques help reduce computational costs while maintaining performance:

Pruning – Removes unnecessary neurons or parameters

Quantization – Converts high-precision floating-point calculations to lower-bit values, such as 8-bit integers, for faster inference

Distillation – Trains a smaller “student” model to replicate the behaviour of a larger “teacher” model

These methods allow companies to deploy models more efficiently, reducing server costs and improving response times.

Deployment – Making LLMs Accessible to Users

Once optimized, LLMs are deployed through:

Cloud-based APIs (e.g., OpenAI, Anthropic, Google Gemini)

On-premises enterprise solutions (for companies needing data security)

Edge computing (running models on consumer devices to reduce cloud dependency)

At this stage, the model becomes accessible to developers, businesses, and end-users, powering everything from chatbots to coding assistants and enterprise automation tools.

What DeepSeek Did Differently?

DeepSeek didn’t just refine this pipeline—it reimagined it. Here’s how:

Pure Reinforcement Learning Approach

Unlike traditional LLMs that heavily depend on supervised learning and human-labelled data, DeepSeek’s R1 model uses a reinforcement learning-first approach.

This means the model teaches itself, optimizing outputs through trial and error rather than relying heavily on human feedback. In other words, the model chooses to “reward” or “penalise” itself based on the outputs it generates during the RL process.

Mixture of Experts (MoE) Architecture

DeepSeek utilizes a Mixture of Experts (MoE) framework, which consists of 671 billion parameters but activates only 37 billion at a time, reducing computation costs while maintaining state-of-the-art performance and the R1 model can specialize in different tasks dynamically rather than treating all inputs equally.

The MoE framework is like having a team of specialised geniuses, each tackling different parts of a problem. Instead of one massive model doing all the work, MoE smartly deploys the best experts for each task, slashing computational costs and supercharging efficiency. During training, these experts are exposed to diverse datasets, making them versatile problem-solvers. A gating mechanism decides which experts to activate, and when multiple experts are activated, their outputs are seamlessly combined, boosting the model’s overall performance.

Cost-Effective Training with Minimal Compute

But what truly sets DeepSeek apart from giants like Google and Mistral (who already use MoE architecture in some of their models) is their ingenious quantization technique. By using 8-bit numbers for most model weights and switching to 32-bit only for complex calculations, DeepSeek cracked the code for training on less powerful NVIDIA hardware.

Consequently, DeepSeek signalled a shift in AI economics, proving that cutting-edge LLMs can be developed without requiring astronomical compute budgets. The future of LLMs isn’t about building bigger models—it’s about making them smarter and more efficient.

Open-Source Accessibility

Unlike OpenAI and Anthropic, DeepSeek released its model under an MIT open-source license, allowing anyone to use and modify it, accelerating AI research and development and opening the door to smaller companies and researchers to build on cutting-edge AI without needing massive funding.

Why DeepSeek Caused a Disruption

As mentioned earlier, the disruption caused by the R1 model wiped almost $1 trillion off the stock market in a single day. This seismic reaction wasn’t just about DeepSeek itself—it was about what it represented. The AI industry has long operated under the assumption that state-of-the-art models require billion-dollar budgets, massive datasets, and the backing of Silicon Valley’s most powerful firms. DeepSeek shattered that illusion. It proved that a cutting-edge LLM could be developed with fewer resources, in the face of economic sanctions, and with a fraction of the compute traditionally thought necessary.

This revelation sent shockwaves through AI labs, venture capital firms, and governments alike. It has significantly lowered the barrier to entry for new competitors. It also challenges OpenAI and Anthropic’s business models, which have been built around the immense cost and complexity of training LLMs. Companies that had bet heavily on Western AI providers now have to reconsider their strategies, and ask themselves what does this mean for the long-term sustainability of closed-source AI development?

Yet, DeepSeek R1 has not come without controversy. One of the biggest concerns surrounding DeepSeek is its origins in the People’s Republic of China (PRC). While DeepSeek itself operates as a private company, China’s government maintains strict oversight on AI development, and international policymakers are already questioning whether a PRC-based LLM can ever be fully independent from state influence. This raises security concerns, particularly for enterprises handling sensitive data. Unlike OpenAI or Anthropic, where regulatory scrutiny ensures some level of accountability, DeepSeek lacks clear transparency on how it handles user interactions, data retention, and security vulnerabilities.

Compounding this, there are also questions about the model’s resilience to security threats. Early security assessments suggest that DeepSeek R1 may have weaker safeguards against adversarial attacks, prompt injections, and data manipulation. While Western AI firms have invested heavily in reinforcement learning for safety alignment, DeepSeek’s RL-first approach (V3), while innovative, appears to have left some vulnerabilities open. If these issues are not addressed, they could limit adoption among enterprises that require high-security standards for AI deployment.

Another area of scepticism is DeepSeek’s claim that it trained R1 with only 2,000 Nvidia GPUs for a total cost of $5.6 million. Mixture of Experts (MoE) architectures require complex coordination across multiple compute nodes, making training more expensive than standard dense models. While DeepSeek has undoubtedly optimized efficiency, the lack of detailed cost breakdowns has raised doubts about whether these numbers are fully accurate or simply PR-friendly estimates.

Despite that, the impact of DeepSeek R1 cannot be ignored. It has fundamentally redefined expectations for cost, efficiency, and the feasibility of non-Western AI dominance. For AI labs, it is a wake-up call. For enterprises, it is a new opportunity and a new risk. And for governments, it is a challenge that will force new discussions about AI security, international regulation, and the future of global AI leadership.

The Future of LLMs

DeepSeek has provided a Sputnik moment for AI—a shocking disruption that forces incumbents to react, adapt, or be left behind. But just as the Soviet Union ultimately lost the space race, DeepSeek’s R1 may be a fleeting champion rather than a lasting king. While it will hold a pivotal place in AI history, its long-term dominance is far from guaranteed. Already, Western governments are exploring ways to ban or restrict PRC-origin LLMs, citing concerns over security, transparency, and influence. Regulatory walls are going up fast, and DeepSeek may soon find itself locked out of critical global markets.

What will be indisputable, however, is the impact of its techniques. The myth that state-of-the-art AI requires billion-dollar budgets has been shattered. DeepSeek has reset expectations, proving that with the right architecture, LLMs can be trained and deployed for a fraction of the cost. The implications are profound: where once the market seemed destined to be dominated by a few monolithic models, we will now see an explosion of LLMs, each catering to different industries, applications, and regulatory environments.

With the cost of development plummeting, enterprises will no longer need to rely solely on OpenAI, Anthropic, or Google for their AI needs. Instead, we are entering an era where AI is fully commoditized—where businesses will build, and tailor their own specialized language models (SLMs) optimized for niche domains like law, finance, cybersecurity, and medicine. Rather than a single dominant model shaping the AI landscape, we will see a fragmented, highly specialized market—a shift as significant as the move from mainframe computing to cloud-based microservices. Soon we will enter the wild west of AI.

This shift will also force a brutal cost-cutting war among AI providers. While training expenses are dropping, the real battle will be in reducing inference and deployment costs. Companies will race to build hardware-efficient models, leveraging techniques like edge AI, quantization, and multimodal architectures to drive down compute, storage, and energy requirements. The days of relying solely on massive cloud-based LLMs are numbered; soon, AI models will be cheap enough to run locally on consumer hardware without compromising performance.

DeepSeek may not be the company to define this future, but it has accelerated its arrival. The AI industry now faces a new reality—one where cost, accessibility, and specialization matter more than raw scale. The race has begun, and the next winners will be those who can build faster, cheaper, and more adaptable AI models for an increasingly fragmented world.

How We Can Help

At SCSK {digital}, we specialize in delivering tailored AI strategies that align with your business goals, ensuring that you stay ahead in an increasingly AI-driven world. Whether you’re navigating the rise of specialized AI models, optimizing infrastructure costs, or exploring multimodal AI applications, our team of experts is here to help.

We offer end-to-end AI services, including:

AI Strategy & Impact Assessments – Helping you understand where AI can drive the most value in your organization.
Custom AI Integration – Deploying the right AI models for your specific workflows, whether that’s through leading LLM providers or bespoke, domain-specific solutions.
AI Governance & Compliance – Ensuring responsible AI adoption with robust risk management, security protocols, and ethical AI frameworks tailored to regulatory environments.
Cost-Effective AI Scaling – Advising on how to reduce infrastructure costs, improve efficiency, and future-proof AI investments without sacrificing performance.

With AI evolving at an unprecedented pace, businesses need agile, informed strategies to remain competitive. We help you navigate the complexities of AI adoption while ensuring you remain in control.

Authors:

Sim Riyat, AI Specialist. sim.riyat@scskeu.com

Chidi Akurunwa, AI Specialist. chidi.akurunwa@scskeu.com

digital@scskeu.com

Vintners’ Place, 68 Upper Thames Street, London EC4V 3B

Traditional LLMs Are Dead… Long Live DeepSeek?

New Horizons for LLMs (Large Language Models)

LLMs Over Time

How Traditional LLMs Work

What DeepSeek Did Differently?

Why DeepSeek Caused a Disruption

The Future of LLMs

How We Can Help

Subscribe to our newsletter

digital@scskeu.com

Vintners’ Place, 68 Upper Thames Street, London EC4V 3B

New Horizons for LLMs (Large Language Models)

LLMs Over Time

How Traditional LLMs Work

What DeepSeek Did Differently?

Why DeepSeek Caused a Disruption

The Future of LLMs

How We Can Help

You may also like

Subscribe to our newsletter