AlphaGo at 10

Ten years ago, a machine proved intuition was computable. It forgot to prove it was trustworthy.

Three machines define the arc of artificial intelligence, and each one answered a different question about what computers can do. Deep Blue answered in 1997: a computer can beat the best human at chess. AlphaGo answered in 2016: a computer can teach itself to surpass human intuition. GPT-3 answered in 2020: a computer can generate language indistinguishable from a human’s. Each answer was a technical triumph. None of them answered the question that matters most: can you prove it works?

Today, March 15, 2026, is the tenth anniversary of AlphaGo’s victory over Lee Sedol in Seoul. It is a good day to ask why that question still has no answer, and what it would take to build one.

Three Machines

Deep Blue was an expert system in the classical sense. IBM’s engineers and a team of grandmasters, including Joel Benjamin, encoded human chess knowledge as evaluation functions: material advantage, king safety, pawn structure, piece mobility. The machine searched 200 million positions per second through brute-force alpha-beta pruning, then scored each one against rules that humans had written by hand. When Deep Blue defeated Garry Kasparov in May 1997, every move it played could be traced to a rule that a person had authored. The machine was powerful, but it was also transparent. If you asked why Deep Blue played a particular move, the answer was in the code: because the evaluation function scored that position higher than the alternatives.

Deep Blue was governed by construction. Its intelligence was borrowed.

AlphaGo was something new entirely. DeepMind’s team, led by David Silver, trained a deep convolutional neural network on 30 million positions from expert games, then let the system play millions of games against itself through reinforcement learning. The machine did not search every possibility the way Deep Blue searched chess. It learned to evaluate positions the way a human master evaluates positions, by developing what players call intuition: the ability to look at a board and know, without calculating, that one region matters more than another.

On March 15, 2016, in Game Two of the five-game match, AlphaGo played Move 37. The move was a shoulder hit on the fifth line, a play so unusual that the commentators assumed it was an error. Fan Hui, the European Go champion who had lost to AlphaGo five months earlier, said he was stunned. Lee Sedol left the table for fifteen minutes. Over the next hundred moves, AlphaGo proved that Move 37 was not only correct but brilliant, a play that no human master had considered in the game’s 2,500-year history. AlphaGo won the match four games to one.

The single loss, Game Four, produced its own moment of brilliance. Lee Sedol played Move 78, a wedge that AlphaGo’s neural network had evaluated as having a one-in-ten-thousand probability of being played. The human surprised the machine. The crowd in Seoul erupted. Lee Sedol called it “God’s move.”

But the story everyone told afterward was not about Move 78. It was about Move 37. And the lesson the industry extracted from Move 37 was the most consequential mistake in the history of artificial intelligence.

The Wrong Lesson

The lesson was this: AI does not need to explain itself.

AlphaGo made a move no human understood, and it turned out to be right. The neural network that produced Move 37 was a 13-layer deep convolutional architecture with millions of learned parameters. No one at DeepMind could explain why Move 37 was correct. They could describe the architecture, name the layers, point to the training data. They could not say what the network saw in that position that three thousand years of human masters had missed. The machine had developed a form of pattern recognition that exceeded human understanding, and the output was real even if the reasoning was opaque.

This was a genuine scientific achievement. It was also, in retrospect, the moment the AI industry decided that opacity was a feature rather than a flaw. If the best move in the history of Go came from a system that could not explain itself, then perhaps explainability was overrated. Perhaps the machines that would change the world were precisely the machines that worked in ways we could not follow.

Deep Blue had been transparent because it was built from rules. AlphaGo was opaque because it was built from data. The industry chose data. It chose scale. It chose capability over accountability, and it built the next decade on that choice.

In 2017, DeepMind published AlphaGo Zero: a system that learned to play Go from scratch, with no human games at all, and surpassed the original AlphaGo within three days of training. The machines did not even need our data anymore. They needed only the rules and a scoring function.

In 2020, OpenAI released GPT-3: 175 billion parameters, trained on a substantial fraction of the internet. Language itself became the game board, but unlike Go, language has no rules, no scoring function, and no specification. Just prediction. The most capable and least governed AI architecture in history.

In February 2023, ChatGPT reached 100 million monthly active users, the fastest adoption of any consumer technology ever measured. Lawyers began using it to draft briefs. Clinicians began using it to interpret imaging. Financial analysts began using it to summarize regulatory filings. None of these deployments had evidence chains. None had audit trails. None could answer the question: where did that answer come from?

The decade between AlphaGo and today was the decade of ungoverned intelligence.

The Cost

The data on what ungoverned intelligence costs is no longer anecdotal. It is systematic.

The AI Hallucination Tracker maintained by Damien Charlotin documents hundreds of cases in which AI systems generated fabricated legal citations that reached judges in active proceedings. The Mata v. Avianca case in 2023, in which a New York attorney submitted a ChatGPT-generated brief containing six invented case citations, was only the first. By 2025, federal courts were issuing sanctions routinely, and the fines were climbing.

In clinical medicine, the evidence is equally direct. A 2018 study in JAMA Dermatology demonstrated that machine learning models for skin disease had been trained predominantly on light-skinned populations and performed poorly on darker skin tones, a failure that became clinically dangerous when these models were deployed without population-specific validation. Stanford’s Human-Centered AI Institute found that legal AI models hallucinate on one out of every six queries, a rate that would be intolerable in any other professional instrument.

The pattern is consistent across every regulated industry. AI systems trained to optimize performance were deployed without governance, and the cost was borne by the people who trusted the output: patients, clients, defendants, citizens. The machines were capable. The machines were ungoverned. Capability without governance is a liability, not an asset.

The Game Theory

My mentor Atul Butte once observed that medicine is practiced synchronously, the way games like chess and Go are played: we write orders, we wait to see what happens, we write more orders, and computers are really good at that.

He was right about the structure. But the comparison reveals something deeper when you follow it to its conclusion.

Chess is a game of perfect information: both players see the entire board, every piece, every position. Deep Blue won at chess because chess is a search problem, and search was something expert systems could do by brute force. The evaluation function was hand-coded. The governance was built into the machine.

Go is also a game of perfect information, but the search space is so vast that brute force fails. AlphaGo won at Go because it replaced search with learned intuition. The governance was no longer in the code. It was dissolved into millions of parameters. The machine played beautifully, but nobody could say why.

Medicine is a game of imperfect information. The clinician cannot see the entire board. The patient’s genome, environment, history, and trajectory are only partially observable. The stakes are not points or pride but life and death. And unlike chess or Go, medicine has no terminal state, no moment when the game ends and someone counts the score. The patient keeps living. The consequences keep compounding.

This is where the game theory breaks down, and where it becomes most instructive. Deep Blue could be governed because chess was simple enough for rules. AlphaGo could not be governed because Go was complex enough to require learning, and learning dissolved the rules. But medicine, law, and finance are more complex still, and they require both the capability that learning provides and the accountability that rules enforce.

The question is not whether AI can play. AlphaGo settled that a decade ago. The question is whether anyone has built the game worth playing: a governed space with an evidence chain for every move and an audit trail for every outcome. Players come and go. The game persists. That is what CANONIC is.

What CANONIC Is

Deep Blue was a player that obeyed rules someone else wrote. AlphaGo was a player that wrote its own rules, then forgot them. Large language models are players that never had rules to begin with. CANONIC is not a player. CANONIC is the game.

CANONIC is the game that AI systems play on: the governance framework that defines the conditions under which they are permitted to operate. Eight dimensions of compliance, each one binary: satisfied or not. Eight questions that any governed system must answer before it speaks, acts, or deploys. One score, 255, that proves all eight are satisfied. Go has four rules and a scoring function, and those rules have outlasted every player for 2,500 years. CANONIC has eight dimensions and a scoring function, and those dimensions define the game that AI must play in domains where the stakes are not trophies but survival.

The framework composes three primitives. INTEL is what the system knows: the evidence layer that ensures every claim traces to a source, timestamped and auditable. CHAT is what the system says: the interface that speaks the language of its domain, backed by INTEL, never asserting without evidence. COIN is what the system does: every action minted as work, every work on the LEDGER, no untracked output and no ghost labor.

Deep Blue’s evaluation function was its governance: hand-coded, transparent, limited. AlphaGo had no evaluation function a human could read; its governance was dissolved into weights. CANONIC’s governance is architectural. It does not live in the parameters of a neural network. It lives in the specification that the neural network must satisfy before it is permitted to operate. The players can be as opaque as AlphaGo. The game cannot.

Move 37 was brilliant and inexplicable. In a governed system, it would also be auditable: every piece of evidence the network relied on would be in the INTEL layer, every interaction would be on the LEDGER, and every output would carry a compliance certificate that proves the system met all eight dimensions before it spoke. The brilliance would remain. The opacity would not.

The Arc

1997 — New York. Deep Blue defeats Kasparov. Expert system. Hand-coded rules. Transparent and limited. Can a machine beat the best human at a game of perfect information? Yes, if you give it enough rules and enough search.

2011 — Yorktown Heights. Watson wins Jeopardy. Natural language processing meets knowledge retrieval. Can a machine understand human language well enough to compete? Yes, within a bounded domain.

2016 — Seoul. AlphaGo defeats Lee Sedol. Deep reinforcement learning. Learned intuition. Move 37. Can a machine develop intuition that surpasses human expertise? Yes, if you give it the rules and a scoring function.

2020 — San Francisco. GPT-3. 175 billion parameters. No rules. No scoring function. Just prediction. Can a machine generate language that is indistinguishable from human prose? Yes. And it will hallucinate while doing it.

2026 — Orlando. CANONIC. Eight dimensions, three primitives, 255 bits. Can you prove your AI works? Yes. Can you audit the evidence chain? Yes. Can you show the LEDGER? Yes.

Five points. Three decades. Four players and a game. The thread runs from a chess match in Manhattan to a governance kernel in Orlando, through every advance in capability and every failure of accountability, through every hallucination and every sanction, through a Go board in Seoul where a machine played the most beautiful move in the history of a game and nobody could say why.

What Lee Sedol Knew

Lee Sedol retired from professional Go in 2019. When asked why, he said: “With the debut of AI in Go games, I’ve realized that I’m not at the top even if I become the number one through frantic efforts. Even if I become the number one, there is an entity that cannot be defeated.”

He was right, but he was describing a game. In a game, the entity that cannot be defeated is the one with the best moves. In a governed system, the entity that cannot be defeated is the one with the best proof.

Deep Blue’s proof was the search tree. AlphaGo’s proof was the scoreboard. The LEDGER is ours.

Happy anniversary, old algorithm. You proved AI could think. We are proving it can be trusted.

Citations

Silver, D. et al. “Mastering the game of Go with deep neural networks and tree search.” Nature 529, 484–489 (2016).
Silver, D. et al. “Mastering the game of Go without human knowledge.” Nature 550, 354–359 (2017).
Brown, T. et al. “Language Models are Few-Shot Learners.” NeurIPS (2020).
IBM. “Deep Blue.” IBM Archives.
IBM. “Watson on Jeopardy!” IBM Archives.
Milmo, D. “ChatGPT reaches 100 million users two months after launch.” TIME, February 2023.
Charlotin, D. AI Hallucination Tracker. Ongoing database of AI-fabricated legal citations.
Weiser, B. “Here’s What Happens When Your Lawyer Uses ChatGPT.” Mata v. Avianca, S.D.N.Y. (2023).
NPR. “Courts sanction lawyers for AI-generated fake citations.” July 2025.
Adamson, A.S. & Smith, A. “Machine Learning and Health Care Disparities in Dermatology.” JAMA Dermatology 154(11), 1247–1248 (2018).
Stanford HAI. “AI on Trial: Legal Models Hallucinate in 1 out of 6 or More Benchmarking Queries.” 2024.
Lee Sedol retirement. “Go master quits, says AI cannot be defeated.” CNN, November 2019.
AlphaGo vs Lee Sedol. Wikipedia match record.

Figures

Context	Type	Data
post	audit-trail	items: Deep Blue 1997 → Watson 2011 → AlphaGo 2016 → AlphaGo Zero 2017 → GPT-3 2020 → ChatGPT 2023 → CANONIC 2026

CANONIC — Governed since Move 37.