FORMA:
Formal Reasoning and Modelling Agent

An Application to Clinical Trials

Cyrus Z Zhou¹, Yufei Jin², Yilin Xu³, Chieh-Ju Chao^1,4, Monica S Lam¹

¹ Stanford University ² UCLA ³ JHU ⁴ Mayo Clinic

Stanford-OVAL

📄 Paper (arXiv) 🌐 Project Page 📚 BibTeX

Project Website · Last updated: Nov 2025

News & Updates

December 10, 2025 Poster Presentation at Stanford AI+HEALTH 2025 Present TrialFORMA at Stanford AI+HEALTH 2025 poster session. Poster Link
December 1, 2025 Public Website Launch TrialFORMA project website is now live at trialforma.com.
November, 2025 Evaluating the system Preliminary benchmarks show high recall and transparent eligibility reasoning.

Abstract

Effective patient-trial matching lies at the heart of clinical research. Despite spending approximately $1.9 billion annually on recruitment, nearly 80% of clinical trials still fail to meet their enrollment targets. Errors in patient-trial matching are high-stakes on both recall and precision: false negatives deny patients from potentially life-saving options, while false positives steer patients toward infeasible or unsafe trials. Yet current pipelines leveraging embedding-based retrieval are optimized to minimize aggregate retrieval loss rather than eliminate mismatches for each patient-trial pair, making them structurally ill-suited to this safety-critical task.

We present TrialFORMA (FOmal Reasoning and Modeling Agent), a system for patient–trial matching that grounds patient records and eligibility criteria in a shared formal SMT (Satisfiability Modulo Theories) representation. Unlike recent approaches that merely minimize error rates, TrialFORMA is designed around an exactness objective, aiming—within its formal representation—to eliminate both false negatives and false positives.

Attempting to represent both patient records and trial eligibility criteria in SMT confronts us with several challenges: (1) imperfect semantic parsing from natural language into SMT predicates; (2) substantial variation in concept names and surface expressions across trials and patient records; and (3) incomplete clinical ontologies that omit relevant concepts and relations.

To address these challenges, we develop a family of agentic workflows that (1) semantically parse natural language into SMT; (2) perform concept canonicalization and expression schematization; and (3) bridge missing links in the underlying clinical ontologies.

Preliminary experiments on standard patient–trial matching benchmarks show that TrialFORMA achieves near-perfect recall at high precision, while providing clause-level auditability of eligibility decisions. These results establish symbolic, logic-based matching as an accurate and transparent foundation for high-stake clinical eligibility workflows.

Key capabilities

Compiler from trials → SMT: Turns free-form eligibility criteria into typed logical constraints (CNF / SMT-LIB v2).
Ontology-aware entailment: Uses SNOMED / UMLS hierarchies to support principled subsumption (e.g., Type 2 diabetes ⊑ Diabetes mellitus).
Patient schematization: From raw EHR/notes to normalized patient facts; supports optional LLM‑assisted diagnoses with guardrails.
Relation curation: Curating missing relations in SNOMED/UMLS ontology, including "may entail".
Designs for scalability: SMT2PL projection, Database Optimization (Constraints as Database Tables), Key-Value Cache Optimization for Ontology Curation and Entailmen
Deterministic matching: SMT solver yields exact satisfiability with proof‑carrying explanations (unsat cores/assumptions optional).

Results

We evaluate TRIAL-FORMA on a held-out set of patient–trial pairs with expert labels. Our system improves strict eligibility classification compared to a strong embedding-based baseline while providing interpretable reasoning traces.

Recall@K	SIGIR	TREC2022
TrialGPT (embedding-based)	##	##
TRIAL-FORMA (ours)	##	##

TrialFORMA can deliver real-time matching with almost no LLM inference costs

Resources

Paper: arXiv:xxxx.xxxxx

Dataset: SIGIR 2016 corpus , TREC Clinical Trials 2021 corpus, TREC Clinical Trials 2022 corpus

Contact: zikai@stanford.edu

BibTeX

@software{forma_clinical_trial_matching,
        title = {FORMA: Formal Reasoning and Modelling Agent},
        subtitle = {An Application to Clinical Trials},
        author = {Cyrus Zhou, Yufei Jin, Yilin Xu, Chieh-Ju Chao, Monica S Lam},
        year = {2025},
        url = {https://github.com/stanford-oval/trial-forma}
      }