Skip to content

Instantly share code, notes, and snippets.

@codecraftercroupier33
Last active November 16, 2025 21:55
Show Gist options
  • Select an option

  • Save codecraftercroupier33/e4dc17cd0a863bf426d44701258a15ed to your computer and use it in GitHub Desktop.

Select an option

Save codecraftercroupier33/e4dc17cd0a863bf426d44701258a15ed to your computer and use it in GitHub Desktop.
poker-ai-history.md

Hey

The journey from the first simple bots to superhuman AI like Libratus and Pluribus is a fascinating 27-year story of academic rivalries, commercial scandals, and massive algorithmic breakthroughs. It's a perfect case study in AI for imperfect information games.

I've written up a little history, from the first academic attempts in 1998 to the modern "GTO Solvers." Grab a coffee, this is a long one.


The History of Online Poker Bots: From Loki's Rules to CFR's Divinity

How we went from hand-crafted heuristics to mathematical perfection in 20 years.


Intro: Why is Poker the "Holy Grail" of AI?

For decades, poker was one of the most difficult problems for artificial intelligence. Why? Because unlike chess or Go, where everything is visible on the board, poker is a game of imperfect information. You don't know what your opponent has. You have to bluff. You have to read patterns. You have to manage risk under conditions of fundamental uncertainty.

For a programmer, it's a fascinating challenge: how do you write an algorithm that "guesses" what the opponent has without seeing their cards? How do you teach a machine to bluff? And how do you make it do that better than a human?

This story is a journey through 27 years (1998-2025) – from simple, manually written rules, through loud scandals with commercial bots, to algorithms achieving mathematical perfection. A history in which academia, criminals, and corporations fought a battle for the future of a game worth billions of dollars.


Chapter 1: The Academic Pioneers (1998-2002) – Loki and Poki

Computer Poker Research Group (University of Alberta)

It all began in 1996 at the University of Alberta in Canada. A research group, the Computer Poker Research Group (CPRG), decided to systematically tackle poker as an AI problem. Their goal: create a program that could play Limit Texas Hold'em at the level of human professionals.

Loki (1998) – The First Real Bot

In 1998, Denis Papp defended his master's thesis describing Loki – the first serious poker bot in history. What was so groundbreaking about it?

Loki's Key Innovations:

  1. Hand Strength (HS) – a mathematical calculation of the hand's current strength.
  2. Hand Potential (PPOT/NPOT) – the potential to win (Positive) or lose (Negative) on future streets.
  3. Effective Hand Strength (EHS) – combining the above into a single metric: EHS = HS + (1-HS) * PPOT
  4. Opponent Modeling – the real revolution: Loki maintained "weight tables" for each opponent and updated them dynamically.

How did Opponent Modeling work?

// Pseudocode: a simplified opponent model in Loki
struct OpponentModel {
    float handStrengthWeights[169]; // 169 unique starting hand types
    
    void update(Hand opponentHand, Action action) {
        if (action == RAISE) {
            // If the opponent raises, increase weights for strong hands
            for (int i = 0; i < 20; i++) { // Top 20 hands
                handStrengthWeights[i] *= 1.05;
            }
        } else if (action == FOLD) {
            // If the opponent folds, decrease weights
            for (int i = 0; i < 50; i++) {
                handStrengthWeights[i] *= 0.95;
            }
        }
        normalize(); // Normalize to a probability distribution
    }
};

Sources:

Poki (2002) – Loki's Successor

Poki refined its predecessor's methods:

  • It used selective sampling – on-the-fly simulations to make decisions.
  • It was one of the early attempts to apply simple neural networks to poker, using them to predict opponent actions based on betting patterns.
  • It was already good enough to consistently win at low stakes online.

Fun fact: Poki was later licensed for the video game "Stacked" (2005), featuring Daniel Negreanu.

Sources:


Chapter 2: The "Wild West" (2005) – The WinHoldEm Scandal

The Online Poker Boom and its Dark Side

In the mid-2000s, online poker exploded. Millions of players online. Billions of dollars at stake. And where there's money, there's crime.

Ray E. Bornert II and WinHoldEm – The "Democratization" of Cheating

In 2005, Jeff Atwood (of Coding Horror and later Stack Overflow fame) wrote a famous article: "The Rise of the PokerBots" which exposed the activities of Ray E. Bornert II – the creator of WinHoldEm.

WinHoldEm's Offer:

  • $25: Basic version – hand analysis.
  • $200: "Team Edition" – FULL BOT + COLLUSION MODULE.

What does "collusion" mean?

It allowed multiple bots at the same table to exchange information about their cards. The result? A single, multi-headed player against individual humans. A crushing advantage.

Bornert's Justification (quote from an interview):

"Online poker is full of cheaters anyway. I'm just 'democratizing' access to the weapons. It's deliberate civil disobedience. I'd rather be unethical than be a victim."

Programming Trivia

1. Screen-Scraping (The amateur's method)

WinHoldEm used literal screen-scraping – the bot "looked" at the screen, reading pixels:

// Pseudocode: how WinHoldEm read cards
COLORREF pixelColor = GetPixel(hdc, cardX, cardY);
if (pixelColor == RGB(255, 0, 0)) {
    card = ACE_OF_HEARTS;
} else if (pixelColor == RGB(0, 0, 0)) {
    card = ACE_OF_SPADES;
}
// ... and so on for every card

Requirement: The user had to lock their Windows graphics settings to specific values. Changing DPI or a theme = the bot stopped working.

2. Better Methods (For serious bots)

In the comments of Atwood's article, it was revealed that advanced bots DID NOT use screen-scraping:

A) Reading Log Files:

// PartyPoker wrote hand history to a file in real-time
std::ifstream logFile("C:\\PartyPoker\\handhistory.txt");
std::string line;
while (std::getline(logFile, line)) {
    if (line.find("dealt") != std::string::npos) {
        // Parse the hand
    }
}

B) Windows Messages:

// Empire Poker used standard Windows controls
HWND buttonHandle = FindWindow(NULL, "Call");
SendMessage(buttonHandle, BM_CLICK, 0, 0); // "Click" the button

// Read the text
char buffer[256];
GetWindowText(potSizeLabel, buffer, 256);
int potSize = atoi(buffer);

3. Ultrafast Equity Libraries (2005)

Around the same time, the first open-source C++ libraries for fast equity calculations appeared, like poker-eval:

// Example usage of poker-eval
#include "poker_defs.h"

StdDeck_CardMask hand, board;
// ... initialize cards
double equity = calculate_equity(hand, board, 1000000); // 1M simulations

Calculating millions of hands per second became accessible to everyone.

Source:

Consequences of the Scandal

Atwood's article caused a panic. Poker rooms invested millions in security systems. WinHoldEm was banned, but the cat was already out of the bag.

Sources:


Chapter 3: The Academic Breakthrough (2007-2015) – The CFR Era

Counterfactual Regret Minimization – The Most Important Algorithm in Poker AI History

An arms race ensued for the next decade. The real breakthrough came in 2007 when Martin Zinkevich (now at Google) published a paper on Counterfactual Regret Minimization (CFR).

What is CFR? (An explanation for programmers)

CFR is a Reinforcement Learning algorithm, but completely different from Deep Q-Learning or Policy Gradients.

The Key Idea:

  1. The bot plays billions of hands against itself.
  2. In each iteration, it analyzes its "regret" – a mathematical measure of how much it regrets not playing a different action.
  3. It updates its strategy to more often play the actions it "regretted" not playing.
  4. After trillions of iterations, the total regret drops to zero = the strategy reaches a Nash Equilibrium.

Nash Equilibrium = a strategy that cannot be improved upon if the opponent is also playing optimally.

Pseudocode for CFR (simplified):

def cfr(gameState, player, reachProb):
    if gameState.isTerminal():
        return gameState.utility(player)
    
    infoSet = gameState.getInfoSet(player)
    strategy = regretMatching(infoSet.regrets)
    
    utilities = {}
    nodeUtil = 0
    
    for action in gameState.legalActions():
        # Recursively calculate the value of each action
        nextState = gameState.apply(action)
        utilities[action] = cfr(nextState, player, 
                                reachProb * strategy[action])
        nodeUtil += strategy[action] * utilities[action]
    
    # Update regrets
    for action in gameState.legalActions():
        regret = utilities[action] - nodeUtil
        infoSet.regrets[action] += reachProb * regret
    
    return nodeUtil

def regretMatching(regrets):
    # Strategy is proportional to positive regrets
    posRegrets = {a: max(0, r) for a, r in regrets.items()}
    sumRegrets = sum(posRegrets.values())
    if sumRegrets > 0:
        return {a: r/sumRegrets for a, r in posRegrets.items()}
    else:
        # Play uniformly
        return {a: 1.0/len(regrets) for a in regrets}

Why is CFR brilliant?

  • No combinatorial explosion – it processes the game tree sequentially.
  • Mathematical guarantee of convergence to a Nash Equilibrium.
  • It can be sampled (Monte Carlo CFR) – you don't need to process the whole tree.

Sources:

Cepheus (2015) – Limit Hold'em is Solved

Michael Bowling and the team from the University of Alberta present Cepheus – a bot that essentially solved 1-on-1 Limit Hold'em.

The Numbers:

  • 70 billion CFR iterations
  • Strategy occupies 11 TB of data
  • Plays so close to perfection that a human would have to play for 70 years, 12h a day, without errors, to statistically prove they were better.

Exploitability: 0.000986 big blinds/game

What does "essentially solved" mean?

A theoretically perfect strategy (Nash Equilibrium) has an exploitability of 0. Cepheus has 0.000986, which in practice means it is impossible to beat in any reasonable timeframe.

Sources:


Chapter 4: Conquering No-Limit (2017) – DeepStack and Libratus

DeepStack (2017) – Continual Re-solving + Deep Learning

Limit Hold'em was solved, but No-Limit is a completely different league – unlimited bets = unbelievable complexity.

DeepStack's Innovations:

  • Continual Re-solving – instead of storing the strategy offline, the bot solves ONLY the current part of the game "on the fly."
  • Deep Neural Networks as "intuition" – estimating the value of future moves without a full simulation.

Architecture:

# Pseudocode: how continual re-solving works
def deepstack_play(gameState):
    # 1. Use a blueprint for early rounds
    if gameState.street <= 1:
        return blueprint_strategy(gameState)
    
    # 2. For later rounds: solve a subgame
    subgame = extract_subgame(gameState)
    
    # 3. Use a neural network to estimate terminal values
    leaf_values = neural_net.predict(subgame.terminal_states)
    
    # 4. Solve the subgame with CFR using leaf_values as "intuition"
    strategy = cfr_solve(subgame, leaf_values, iterations=1000)
    
    return strategy.get_action()

Results:

  • December 2016: DeepStack defeated 33 professional players from 17 countries.
  • 44,852 hands played.
  • It beat 11 of them with a margin of statistical significance.

Sources:

Libratus (2017) – Crushing the Professionals

Tuomas Sandholm and Noam Brown from Carnegie Mellon created Libratus – the most advanced bot in history (at the time).

"Brains vs. AI" – The Tournament of the Century:

  • Jan 11-30, 2017, Rivers Casino, Pittsburgh
  • 120,000 hands against 4 top professionals:
    • Jason Les
    • Dong Kim
    • Daniel McAulay
    • Jimmy Chou

The Result:

  • Libratus won $1,766,250 in chips.
  • 14.72 big blinds / 100 hands – an astronomically high win rate.
  • 99.98% statistical significance.

Libratus's Three Modules:

1. Blueprint Strategy (calculated offline):

// Simplified game abstraction
// Instead of 10^161 states -> 10^12 states
AbstractGame abstraction = create_abstraction(TexasHoldem);
Strategy blueprint = mccfr(abstraction, iterations=1e12);

2. Nested Subgame Solving (during play):

// The bot calculates the strategy for the current situation on the fly
Subgame current = extract_subgame(gameState);
Strategy refined = solve_subgame(current, blueprint, real_time=true);

3. Self-Improvement (nightly analysis):

What REALLY terrified the pros:

"Every night after play, Libratus analyzed our unusual plays. The next day, the 'holes' in its strategy were gone. It was like fighting an opponent that adapted." – Jimmy Chou

Technical Details:

  • Bridges supercomputer (Pittsburgh Supercomputing Center)
  • ~600 of 846 compute nodes
  • 1.35 petaflops of computing power
  • ~25 million CPU hours

Sources:


Chapter 5: Multiplayer (2019) – Pluribus Changes Everything

The Multiplayer Problem

1-on-1 was "cracked." But 6-player poker is an exponentially harder problem:

  • It's no longer a zero-sum game.
  • A Nash Equilibrium for >2 players is astronomically difficult to compute.
  • You have to model coalitions, dynamic alliances, and multi-way bluffs.

Pluribus (2019) – The First 6-Max Bot

Noam Brown and Tuomas Sandholm (the same guys as Libratus) + Facebook AI Research present Pluribus.

The Breakthrough:

  • June-July 2019: Pluribus defeated 15 top professionals in 6-player No-Limit Hold'em.
  • 10,000+ hands
  • Played against pros like Darren Elias and Chris Ferguson.

But how?

1. Blueprint Strategy (training):

  • 8 days of training on a 64-core server.
  • <512GB RAM, ZERO GPUs.
  • $150 total cost.

This is insane – AlphaGo needed 1920 CPUs + 280 GPUs. Pluribus: a regular server.

2. Real-time Search:

# Pluribus doesn't solve the whole game – just a "lookahead" of a few moves
def pluribus_search(gameState):
    # 1. Use blueprint for the first round
    if gameState.street == 0:
        return blueprint_strategy(gameState)
    
    # 2. For later rounds: limited-lookahead search
    # Key innovation: this can be done for >2 player games
    subgame = limited_lookahead(gameState, depth=4)
    
    # 3. At the "leaves" of the subgame: consider 5 possible continuation strategies
    for terminal_node in subgame.leaves:
        continuation_strategies = [
            aggressive, passive, balanced, exploitative, random
        ]
        # Choose the best response relative to the blueprint
    
    return solve_subgame(subgame)

Pluribus's Unusual Strategies:

The pros noticed that Pluribus did things humans just don't do:

  • Donk betting (betting "out of position") with strange sizes.
  • Overbetting in situations where humans would fold.
  • Limping (minimal call) from positions where pros always raise.

"There were plays that people just don't make, especially regarding bet sizing." – Michael Gagliano ($2M tournament earnings)

Sources:


Chapter 6: Democratization of Knowledge (2019-2025)

The Problem is Still Open

Despite Pluribus's success, 6-player poker is NOT "solved" in the mathematical sense:

  • Pluribus beat humans, but its strategy is not a perfect Nash Equilibrium.
  • There is still room for improvement.
  • The problem remains an open challenge for AI.

The New Generation of Creators

1. Alexandre Marangoni Costa (Brazil, 2019)

Thesis: "A Study on Neural Networks for Poker Playing Agents"

Created Pucker – a framework for building poker bots in Python. Fun fact: to get the neural networks to work, she had to add... the exact same concepts as Loki from 1998:

  • Hand Strength
  • Hand Potential
  • Opponent Modeling

21 years later, '90s-era "feature engineering" is still necessary.

2. Kylie Ying (USA, 2021)

A popular AI YouTuber. In 2021, she published the series "How to Build a Superhuman Poker AI using CFR" – a step-by-step implementation of CFR in Python.

Sample code from her tutorials:

class KuhnPoker:
    def __init__(self):
        self.nodeMap = {}  # Map information sets -> nodes
    
    def cfr(self, cards, history, p0, p1):
        plays = len(history)
        player = plays % 2
        opponent = 1 - player
        
        # Terminal states
        if plays > 1:
            terminalPass = history[-1] == 'p'
            doubleBet = history[-2:] == 'bb'
            
            if terminalPass:
                return 1 if history[-2:] == 'bp' else -1
            elif doubleBet:
                return 2 if cards[player] > cards[opponent] else -2
        
        infoSet = str(cards[player]) + history
        
        # Get or create node
        if infoSet not in self.nodeMap:
            self.nodeMap[infoSet] = Node(['p', 'b'])
        
        node = self.nodeMap[infoSet]
        strategy = node.getStrategy(p0 if player == 0 else p1)
        
        # Recursive CFR
        util = [0, 0]
        nodeUtil = 0
        
        for action in ['p', 'b']:
            nextHistory = history + action
            if player == 0:
                util[ord(action) - ord('p')] = -self.cfr(
                    cards, nextHistory, p0 * strategy[action], p1
                )
            else:
                util[ord(action) - ord('p')] = -self.cfr(
                    cards, nextHistory, p0, p1 * strategy[action]
                )
            nodeUtil += strategy[action] * util[ord(action) - ord('p')]
        
        # Update regrets
        for action in ['p', 'b']:
            regret = util[ord(action) - ord('p')] - nodeUtil
            node.regretSum[action] += (
                p1 if player == 0 else p0
            ) * regret
        
        return nodeUtil

Sources:

Chapter 7: Implications & A Question for r/programming

The history shows a clear trajectory: from simple academic bots (Loki), to superhuman GTO solvers (Libratus), and now, reportedly, sophisticated commercial bots dominating online games.

The "MCP Moment" – When Building Bots Became Accessible From a programmer's perspective, the poker bot story is fascinating because it represents a massive multi-component project (MCP?) that suddenly became achievable for individual developers around 2005. Take this programmer's story from 2006 as an example. He documented building a working poker bot – a titanic undertaking that required:

Computer vision (card recognition) Game state parsing Opponent modeling Decision algorithms Real-time execution

As he notes in his article:

"First of all, there's a very easy way to detect hole cards via a screen-scraping 'poor-man's OCR' approach" or DLL injection.

That sentence captures a pivotal moment: the technical barriers had fallen. What was once impossible became a weekend project. If you browse the Internet Archive from 2005-2006, you'll see an explosion of forums, tutorials, and communities dedicated to building poker bots "how-i-built-a-working-poker-bot". By 2006, these bots had evolved rapidly – from simple rule-based systems to sophisticated probability calculators. Why This History Matters for Modern Developers I'll be honest: I've never built a bot based on a trained model, and I don't intend to. But as someone learning to train specialized models, poker card recognition is an ideal learning example. It's a constrained problem:

Limited set of classes (52 cards + a few UI elements) Clear success metrics (accuracy, inference speed) Real-world challenges (lighting, angles, different skins)

The 2005 "MCP moment" teaches us something important: when the right tools become accessible (fast equity calculators, OCR libraries, cheap compute), individuals can suddenly tackle problems that once required research labs. We're at a similar inflection point now with on-device AI, mobile ML frameworks, and pre-trained models. The question isn't whether these tools will be used – it's how we as a community shape their development and use. A Question for the Community Can you share examples of projects that use trained AI models in mobile phone apps to assist with video games? I'm curious about the technical approaches – not necessarily for poker, but for any competitive game:

Real-time computer vision pipelines on mobile devices On-device inference for game state recognition Strategies for handling different screen resolutions, lighting conditions, and UI variations

I'm interested in both the ethical implications and the technical challenges of building "AI co-pilots" for games. As bots become ubiquitous, is there a place for defensive AI assistants, or does this just escalate the arms race? Would love to hear your thoughts on the whole history and where we go from here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment