What Power Sleuth
A look under the hood at the infrastructure powering Sleuth: a continuously maintained knowledge graph of the biopharma landscape, structured through extraction and validation, designed to produce traceable, decision-grade analysis.
This is part three in a series. Read part one here and part two here.
Earlier in this series, we laid out the structural problem behind competitive and strategic intelligence in biopharma and introduced Sleuth as our response. With the launch of Studio, the same infrastructure that has powered our engagements for nearly two years is now available directly. Whether you're using Studio yourself or working with our Concierge team, everything runs on the same foundation.
Here's what that foundation is, how we built it, and why it produces answers you can trust.
Sleuth's intelligence starts well before anyone asks a question. By the time you open a landscape or run an analysis, the platform has already acquired millions of data points from across the biopharma landscape, resolved them into canonical entities, connected those entities into a knowledge graph with hundreds of millions of nodes and billions of weighted relationships, and applied analytical lenses to surface what matters. This is a continuous process, and it's the core of what we've built. Every engagement builds on it, and every engagement makes it richer.
Acquiring and structuring the landscape
Sleuth continuously discovers and ingests from hundreds of structured and unstructured sources across public and private domains, maintaining a near-real-time pulse on emerging evidence, events, and market signals. The scope spans tens of millions of publications, over a million clinical trial records, hundreds of thousands of patents and sell-side analyst notes, drug pipelines from tens of thousands of global companies, regulatory filings, and corporate and financial data. (For a full breakdown, see the Platform page.)
These sources are structured differently, updated on different cadences, use different naming conventions, and are often published in different languages. Before any of this data becomes useful, it has to be curated and resolved. The same drug appears as pembrolizumab in a clinical trial registry, Keytruda in a sell-side note, and MK-3475 in a patent filing. The same company appears as a parent in one source and a subsidiary in another. Sleuth resolves these into single canonical entities with all associated evidence linked, identifies semantic conflicts and inconsistencies across sources, and maps entities to both standardized biomedical ontologies and each organization's proprietary terminology.
The same pipeline extends to an organization's own data. When internal assets enter the system, a company in a deal memo resolves to the same canonical entity as that company in the public graph, and a target in a scientific evaluation links to the same clinical programs and patent landscape. Your confidential data is processed in isolated environments with dedicated storage and is never used to enrich the graph for any other organization.
AI models work alongside deterministic pipelines at this layer, not just at the user-facing reasoning layer. When a preclinical asset enters the system with only a program name and a company, AI models equipped with special tools query biomedical databases, patent records, and published literature to identify the asset's target, mechanism of action, and modality, then validate those assignments against standardized ontologies. Classification outputs are cross-checked across independent AI models to catch errors that any single model would miss, and flagged for human review when confidence is low. By the time a fact enters the knowledge graph, it has passed through a structured pipeline of extraction, cross-validation, and review.
This produces a clean, resolved, and continuously updated corpus of the biopharma landscape. What comes next is what makes it powerful.
Connected knowledge
A clean corpus is a starting point. The value of what we've built comes from constructing explicit and implicit relationships across that corpus and maintaining them continuously.
Consider what the full picture involves for a single drug: the mechanism, the clinical programs across every phase and geography, the cohorts and endpoints within each program, the companies involved through ownership and licensing, and the patents, publications, filings, and regulatory signals that give all of it meaning. We build and maintain those connections continuously, across the entire landscape, rather than inferring them at query time.
Every fact and relationship in the graph is accompanied by continuously updated confidence signals derived from the available evidence. These are not binary right-or-wrong flags. They are weighted assessments based on source quality, recency, and cross-source corroboration. The result is a knowledge graph with hundreds of millions of nodes and billions of edges, each tagged with relevant weights and attributes.
This is what makes the platform categorically different from a database you query or an AI tool that reconstructs context from scratch each time. A pipeline database gives you structured fields for a drug: its phase, modality, and indication. A document search tool gives you filings and publications that mention it. Neither knows how a manufacturing capacity announcement, a sell-side downgrade, and a trial protocol amendment relate to each other for the same asset. In Sleuth, those connections already exist before you ask your first question.
The analytical layer then applies opinionated lenses to this connected knowledge to filter noise, surface relevance, and prepare high-confidence signals in the form of competitive landscapes, positioning models, and evidence hierarchies. The intelligence is in the connections and the lenses applied to them, not in the raw data.
Curated, high-fidelity artifacts
Over nearly two years of engagements across deal evaluation, competitive landscaping, indication prioritization, and mechanism deep dives, we found that every high-quality engagement shared a structural pattern. The quality of the final insight depended on the quality of a purpose-built dataset underneath it: deeply structured, heavily tailored to the specific question being asked and the context of who was asking it.
This is a core design principle in the platform. When you run an analysis in Sleuth, whether through Studio or with our Concierge team, the system does not retrieve a generic set of results and summarize them. It constructs a curated, high-resolution dataset specific to your question, drawing from the relevant subgraph of connected entities, relationships, and source evidence, shaped to the particular context of what you're trying to decide.
The result is artifacts like competitive landscapes, asset positioning models, and evidence hierarchies that are built on a foundation you can interrogate. Every conclusion traces back through the graph to specific sources: a publication, a trial record, a patent filing, a sell-side note, a regulatory submission. You can inspect the path from any claim to the evidence behind it. When evidence is thin or conflicting across sources, the system surfaces that directly rather than generating a confident answer and leaving you to discover the gap.
Provenance tracking, traceability, versioning, and auditability are built into the platform at the architecture level. Every fact in the graph carries its source, its derivation logic, and its version history.
Intelligence that compounds
A consulting engagement produces a deliverable that reflects the world at the time it was written. A database export captures a snapshot of structured fields at the moment of the query. Both begin to decay immediately.
Sleuth is built to compound. Every engagement, whether through Studio or Concierge, builds on the same continuously maintained knowledge graph. When you reopen a landscape you built two months ago, every trial readout, deal, regulatory action, and publication that has occurred since is already reflected. The dataset you built for one question becomes the foundation for the next. The competitive landscape you assembled for a licensing evaluation carries forward into the diligence that follows.
The knowledge graph itself grows richer with every entity resolved, every relationship validated, every new source ingested. The analytical artifacts produced for one engagement become reusable starting points for future work. And we built continuous monitoring into the platform: Sleuth agents track the entities and relationships that matter to you and detect signals that materially change the landscape, so your baseline is always current.
There is a second axis of compounding beyond source ingestion. As teams use the platform, those engagement patterns generate signal about what matters within an organization. Sleuth uses this to surface prior work when a new question overlaps with something a colleague has already explored, inform relevance weighting, and flag when a landscape the team cares about has materially changed. The deep dive one person runs today becomes a foundation the next person builds on without knowing to look for it. These usage-derived signals are tenant-isolated and never shared across organizations.
The intelligence flywheel we described in the first post in this series, where a standing worldview makes deep dives faster and deep dives enrich the worldview, is the engineering outcome we designed for.
See the foundation in action
This is the platform we've been building for the last two years. A continuously maintained knowledge graph of the biopharma landscape, with hundreds of millions of nodes and billions of weighted relationships, fed by hundreds of sources, structured through a rigorous pipeline of extraction, cross-validation, and review, and designed to produce high-fidelity artifacts tailored to the question being asked. Every output is traceable to its source. Every engagement compounding on the last.
For a detailed breakdown of our data coverage and platform capabilities, visit the Platform page. To see it in practice, book a demo.
