Life Sciences

The Next-Gen Research Partner: Agentic AI for Accelerated Pharmaceutical Discovery

Overview

Computational biologists today operate in a data-saturated environment, defined by data fragmentation and an ever-growing volume of multimodal datasets. For computational biologists at one of the world's largest pharmaceutical companies, breakthroughs were getting lost in endless manual searches. Researchers were spending lots of time piecing together information, jumping between disparate internal repositories and massive public databases like PubMed. Answering a single complex question meant running multiple queries across different systems and manually triaging the literature. The rise of generative Al opened a new avenue, but off-the-shelf large language models aren't a workable option. They tend to "hallucinate" and cannot access real-time public data or the company's own proprietary research, meaning a trustworthy answer can't be guaranteed. data-fragmentation

Kanda partnered with the client to bridge the gap. Together, we created an advanced research assistant built on an agentic AI framework. Rather than just pulling from pre-trained knowledge, this assistant works like an intelligent research partner. It automatically plans and carries out multi-step research tasks, smart enough to choose from a toolkit of specialized data sources, from live API queries to internal vector databases, to find and synthesize information into a single accurate answer you can verify. The work Kanda has done for this customer is highly relevant to the field because it addresses the core bottleneck computational biologists face: the lack of a unified, trustworthy system that can synthesize structured and unstructured biological knowledge at scale, maintain traceability, and support reproducible scientific reasoning. By leveraging agentic AI capable of dynamic tool selection, vectorized semantic retrieval, and explainable research chains, the solution aligns directly with how modern life sciences organizations are trying to accelerate early discovery while maintaining scientific rigor, auditability, and regulatory readiness.

The Challenge: Data Overload and Research Bottlenecks

The company's R&D teams were struggling with a disconnected data environment that created significant roadblocks to innovation.

Fragmented Knowledge and Manual Toil: This fragmented environment meant that to answer biological questions, scientists had to move between countless sources—public databases like PubMed and UniProt, along with several internal research and clinical repositories. Each system had its own interface and search process, forcing skilled researchers to spend days to months, at times, gathering information instead of analyzing data or developing hypotheses.
The Generative Al Conundrum: The emergence of powerful LLMs presented a clear opportunity. However, standard, off-the-shelf models were unsuitable for serious scientific R&D. Their knowledge is static, limited to their last training date, and they cannot access private internal data or real-time external databases. This resulted in outdated or "hallucinated" answers that were unreliable for high-stakes drug discovery.
The Need for a Specialized, Trusted Solution: The client needed a system that combined the language understanding and reasoning of an LLM with the accuracy of their own trusted data sources. It had to be reliable, transparent, and able to handle the complexity of biological research.

The Solution: An Agentic Al Research Agent

Kanda built a robust application that integrates advanced Al techniques to provide a seamless, reliable research experience. The system's architecture is built on three key pillars:

An Intelligent Agentic RAG Core: The system is built using an advanced Retrieval-Augmented Generation (RAG) approach. Unlike simple RAG, which just fetches documents, this is a true agentic Al built on LangGraph. The LLM is used as a cognitive engine. When a user asks a question, the agent autonomously creates a plan, decides which specialized "tools" (data sources) to query, and continues searching until it has enough information to provide a complete answer.
A Scalable Toolkit for Any Data Source: The Kanda team built a flexible framework that treats every data source—internal or external—as a "tool” the agent can tap into. Because all tools share a common interface, the system scales easily. The primary tool categories are:
- Vectorized Internal Databases: For large, document-heavy sources like PubMed, the team engineered automated pipelines to download, process, and vectorize millions of publications into a Weaviate vector database. The tool then performs high-speed semantic searches to find the most relevant information.
- Live API Adapters: For dynamic external sources like UniProt, the tool acts as an intelligent adapter. It uses an LLM to translate the user's natural language question into a precise, syntactically correct API request, allowing it to query live external data on the fly.
- Query Support: In addition to our core data-fetching capabilities, we also leverage auxiliary technologies such as a Named Entity Recognition (NER) tool that extracts synonyms from user queries to ensure that once data is ingested, it can be queried accurately and effectively.
Transparency and Trust through Explainable Al (XAI): A black box is unacceptable in science. The system was designed for full transparency and auditability.
- Stream of Thoughts: The user interface features a transparent stream of thoughts, allowing the researcher to follow the agent's reasoning step-by-step as it selects tools and analyzes data.
- Verifiable Citations: The system shows users exactly what tools and data sources the agent pulled from. Every answer links directly to all referenced sources, down to particular PubMed articles, so researchers can trace and verify everything.

Results & Impact: From Months of Research to Minutes

The AI assistant has fundamentally transformed the company's research operations, saving thousands of hours and accelerating scientific insight.

A Paradigm Shift in Research Efficiency: Impact was both quick and quantifiable. Since the second quarter of 2024, the system has saved researchers over 40 days of manual literature searches for about 1,500 unique responses. To put it in perspective, a complex hypothesis-generation task that used to require two months of manual effort can now be finished by the platform in roughly one hour.
Accelerating Critical R&D Milestones: Researchers quickly adopted the tool for high-value work. One scientist supporting a pivotal phase 3 study reported that the new platform was instrumental in helping them structure their thinking and in locating the critical research to find answers to questions related to an upcoming BLA filing.
Enhancing Scientific Rigor and Quality: Speed is only part of the benefit; the real gain is a boost in research quality. A user described the tool as a paradigm shift in their ability to comb through the biological research to generate hypotheses with explainable documentation, high scientific quality and speed.
A Scalable Foundation for Future Innovation: The platform is engineered to evolve. Its agentic framework lets new tools and data sources be plugged in with minimal friction. Automated data pipelines keep the underlying information current, and a role-based access-control (RBAC) layer is under development to safeguard sensitive content. The development of the platform is an ongoing process that evolves everyday through integration of new research. Both business and development teams keep the tools up-to-date with emerging research and cutting-edge technologies.
Research Orchestrator: Another key advantage of our system is its role as an orchestrator of agents, tools, and personas, enabling researchers not only to leverage its capabilities but also to extend them by creating custom tools tailored to their specific domains of interest.

Conclusion

With this new platform, Kanda turned a messy, scattered data landscape into a smart, unified research platform. The system goes beyond basic search, it works as a real Al partner that understands questions, plans the research approach, and executes complex tasks. It saves researchers months of manual work, helps them think through problems more clearly, and delivers results they can verify and reproduce. The partnership delivered a solid, scalable solution that accelerates drug-discovery work and helps the client maintain its leadership role in life science innovation.

Back to All Case Studies