A surgeon watched AI hallucinate in a clinical setting. So he built a governance framework.
The patient was 54. Screening mammogram. BI-RADS 4. The AI system said one thing. The clinical evidence said another. And between those two answers was a woman waiting for a phone call that would change her afternoon, or her life.
Nobody could trace why the AI gave the answer it gave. Not the radiologist. Not the data science team. Not the vendor who sold the system with a slide deck and a handshake.
I’m Dexter Hadley. MD/PhD. I’ve spent 23 years in academia, 37 years writing code, and the last decade building AI systems for healthcare. 65 peer-reviewed publications. 6 patent families. Four clinical trials as principal investigator. $38M+ in funded research across NIH, NCI, and state grants (VITAE).
None of that prepared me for that moment. Because all of that work — the models, the papers, the grants — had produced systems that could predict. But not systems that could prove.
The Problem
I started building clinical AI in 2018 — federated learning for breast cancer detection at UCSF, working alongside the founders of OpenMined. The technology was brilliant. The governance was nonexistent.
We could train models across institutions without sharing patient data. We could classify lesions with accuracy that rivaled radiologists. We could deploy chatbots that explained mammography results in plain language.
But we couldn’t prove any of it. Not in the way that matters — not in the way that holds up when a patient’s attorney asks, “On what basis did your AI make this recommendation?”
When a hospital administrator asked “how do you know this recommendation is correct?”, the honest answer was: “the model has 94% accuracy on our test set.” That’s a statistic, not evidence. It doesn’t tell you which training data informed this specific recommendation. It doesn’t tell you who validated the knowledge base. It doesn’t tell you whether the information was current when the patient received it.
In healthcare, “probably right” kills people. Not often. Not dramatically. But quietly, in the spaces between confidence and proof, where decisions are made on trust and liability accrues in silence.
The Journey
1989 — Trinidad. Fatima College. A ten-year-old taught himself BASIC on a TANDY TRS-80. The first code. The first lesson: systems respond to logic, and logic rewards precision. By 1994, PowerStat — a GUI for tracking tropical disease incidence — was selected by the Caribbean Examination Council as best in the country.
1999 — Penn. The University of Pennsylvania. A Quaker’s university, founded by Benjamin Franklin, home of the first medical school in America and the first general-purpose computer. I walked those halls for a decade. MSE in Systems Engineering — the discipline of making systems governed before they’re built. PhD in Genomics and Computational Biology under Junhyong Kim, who taught me that a genome is a program and evolution is its compiler. MD in Precision Medicine. Post-doc in Molecular Ophthalmology with Dwight Stambolian. Warren Ewens — father of mathematical population genetics — on my thesis committee. PennCNV in 2007, my first viral publication before anybody even ever knew that word outside the infectious disease process that coined it. Ten years of first principals teaching first principles. I didn’t know what I was learning. I didn’t know for another fifteen years.
2013 — Stanford. A lecture hall in Gates Building. Balaji Srinivasan teaching Startup Engineering to a class that didn’t know yet what it was learning. I made my first commit on GitHub — GenomicPython, my PhD thesis code ported to the ledger. Something clicked that I couldn’t name for another decade: code is a ledger. Work is evidence. The things you build are the things you can prove.
2018 — HadleyLab. Federated learning with OpenMined. We built BreastWeCan, a breast cancer AI system that could train across institutional boundaries. I was proud of the technology. I spoke about it around the world on the most prestegious stages like Bloomberg and was interviewed in Nature. I was terrified of the governance gap. Distributed training without distributed governance is just distributed risk — invisible, compounding, everywhere at once.
2020 — COVID. The world stopped. We didn’t. Built CovidImaging.com in weeks and deployed it as a clinical trial (NCT05384912). I watched speed outrun accountability in real time. Technical debt doesn’t send you a bill — it shows up as a misdiagnosis three years later, when nobody remembers the shortcut that caused it.
2022 — MedBrain. Clinical decision support at UCF. FHIR-integrated. LLM-powered. The most sophisticated system I’d ever built. And still, when a clinician asked “why did it say that?”, the best answer I had was a confidence interval and a shrug.
2025 — MammoChat. A conversational AI for mammography screening, funded by the Florida Department of Health. Enrolled as a clinical trial (NCT06604078). We ran 80+ customer discovery interviews through NSF I-Corps. Every stakeholder — patients, clinicians, administrators, insurers, regulators — asked the same question:
“Can you prove it works?”
Not “does it work.” Everyone assumed it worked. They wanted proof.
2026 — CANONIC. I built the proof.
The Insight
The answer wasn’t a better model. It wasn’t more training data. It wasn’t a fancier architecture. Those are the answers Silicon Valley keeps selling, and they keep not being enough.
The answer was governance.
Not governance as bureaucracy — forms and committees and sign-offs that pile up in drawers nobody opens. Governance as mathematics. A framework where every claim traces to evidence, every action is ledgered, every deployment validates against a fixed standard, and every gap is logged until it’s closed.
Three primitives: INTEL (what you know), CHAT (what you say), COIN (what you do). One framework: MAGIC. One standard: 255 bits. One rule: if you can’t prove it, don’t ship it.
In 30 days, CANONIC went from concept to 19 governed organizations, 185+ repositories, and 65 patent disclosures. Not because we moved fast and broke things. Because we moved precisely and governed everything. The speed came from the governance, not in spite of it — like a river that runs faster because the banks are well-defined.
Why This Matters To You
If you’re deploying AI in healthcare, you need evidence chains — not for marketing, for litigation. Every recommendation your AI makes is a potential liability. Every liability without evidence is a bet you’re making with someone else’s body. Governance turns that bet into a proof.
If you’re deploying AI in finance, you need audit trails — not for compliance theater, but for the regulator who will sit across the table and ask to see the work. Governance makes the work visible.
If you’re deploying AI anywhere that matters — anywhere a decision affects a life, a livelihood, or a right — you need to answer one question:
“Can you prove it?”
I spent 37 years learning that the question matters more than the model. CANONIC is the framework that finally answers yes.
Figures
| Context | Type | Data |
|---|---|---|
| post | audit-trail | items: BI-RADS 4 → Hallucination → No Evidence → CANONIC |
Dexter Hadley, MD/PhD — Founder, CANONIC