Last week, DeepSeek sent shockwaves through the AI community, the US stock market, and policymakers in Washington. They released a new flagship AI model, DeepSeek-R1, which, despite using a significantly smaller model and far lower training costs, achieved performance results that rivaled those of OpenAI’s flagship o1 reasoning model. This was an industry-defining moment, demonstrating that building cutting-edge AI systems is no longer exclusive to companies with multi-billion-dollar budgets. While there is ongoing speculation surrounding the actual cost to develop their model—DeepSeek claims they spent only $5.6M, while industry analysts argue the true cost could have been multiple times greater—the fact remains that they achieved remarkable performance with lower costs and a smaller model, signaling a disruptive shift in AI development.
DeepSeek’s decision to open-source the model and publish detailed technical reports made this even more remarkable, pulling back the curtain on its training techniques. Unlike OpenAI, which has kept its most advanced models proprietary, DeepSeek released technology that could empower smaller companies, researchers, and startups to build high-performance AI solutions without the massive infrastructure costs traditionally required. DeepSeek leveraged multiple efficiency techniques to achieve this breakthrough, including mixture-of-experts training, efficient cross-node communication, a mixed-precision framework, and low-rank joint compression for key-value caching. These optimizations enabled DeepSeek to create a model that punches far above its weight, offering performance on par with OpenAI’s best without the same hardware constraints.
As the Alaffia AI team closely analyzes emerging AI research, we conducted our own research to assess how DeepSeek’s advancements could be applied to health plan claims operations—specifically with the goal of improving clinical claim reviews.
DeepSeek and OpenAI for Clinical Claim Review Reasoning
In an effort to assess how DeepSeek’s emerging methodologies would affect our domain, we conducted a series of tests to evaluate DeepSeek's efficacy in clinical claim review reasoning. Both DeepSeek-V3 (DeepSeek’s first major model release) and R1 leveraged reinforcement learning to train their models, while their R1-Zero model was trained without supervised fine-tuning. This training approach proved effective in enhancing model reasoning capabilities and enabling strong reasoning behaviors—especially in math and coding tasks, where data is relatively easy to verify and curate.
While we are impressed by the reasoning abilities of models like DeepSeek-R1 and OpenAI’s o1, our findings indicate that models trained on general domain knowledge often struggle with clinical claim reasoning and medical necessity reviews. These are highly specialized fields that require deep domain expertise and adherence to clinical and policy guidelines—factors that general training fails to capture effectively. Consequently, reasoning models trained based on general knowledge have inherent limitations in this context, particularly as they lack the ability to discern the critical nuances essential for accurate medical necessity assessments.
In order to combat the inherent flaw of these large foundational models, at Alaffia, we have developed an agentic AI system built using substantial labeled training data ranging from payer claims data to provider reimbursement data and human clinical reasoning data. Similar to DeepSeek’s mixture-of-experts (MoE) architecture, our system is designed as an ensemble of specialized models to execute various complex tasks in a well-organized way rather than relying on a single, large model that is more challenging to manage and optimize efficacy. This multifaceted AI system was built specifically to power Ask Autodor, our AI co-pilot for clinical teams.
What Really Matters For Health Plan Claims Operations
The impressive results and computational efficiency of DeepSeek reaffirm Alaffia’s strategy of developing a bespoke agentic AI system that’s modular in nature. In our view, the capabilities surrounding foundational models will increasingly become commoditized, so it is imperative that we can dynamically evaluate and switch between them to best serve our clients in various scenarios. The release of DeepSeek is a significant development for the AI industry, driving down costs, increasing competition, and enabling self-hosting and proprietary model deployment.
Alaffia is currently on a great trajectory in accumulating our own clinical expert-labeled datasets, covering everything from document OCR and fact extraction to clinical validation and audit-level decision-making. With these datasets and expert labels, we can rigorously evaluate our internal models against state-of-the-art (SOTA) models. OpenAI’s GPT, Anthropic’s Claude, and Meta’s Llama models are already part of our evaluation process. We are also excited to now assess Qwen 2.5—an open-source vision-language model released last week by Alibaba.
Looking forward, as AI development shifts its focus toward AI “adaptors”—companies that apply foundational models to solve their business-specific challenges—the bespoke clinical claims data we collect will empower us to build the most effective solutions, whether through model evaluation or fine-tuning open-source models. As AI continues evolving, the companies that succeed will go beyond just adopting models—they will be the ones that continuously tailor, fine-tune, and adapt them to real-world use cases. That’s our approach at Alaffia. Stay tuned for more updates on our work in AI for health plans.