Skip to content

Instantly share code, notes, and snippets.

View DSamuelHodge's full-sized avatar
:electron:
State of usual brilliance

DSamuelHodge DSamuelHodge

:electron:
State of usual brilliance
View GitHub Profile
@DSamuelHodge
DSamuelHodge / topology_filtration_semantic_ring.py
Created July 21, 2025 16:42
Code for topology filtration of attention using GUDHI.
# === Imports ===
import torch
import numpy as np
import matplotlib.pyplot as plt
from transformers import AutoModel, AutoTokenizer
import gudhi as gd
from sklearn.manifold import MDS
import warnings
warnings.filterwarnings("ignore")
@DSamuelHodge
DSamuelHodge / pairwise-to-many-body-physics.md
Created July 17, 2025 19:58
We present evidence for a universal phase transition in transformer attention mechanisms that fundamentally alters their computational strategy based on input sequence length.

From Pairwise Attention to Many-Body Physics: The Hidden Phase Shift in Transformers

Derrick Hodge
Independent Researcher
derrick@hodgedomain.com

Abstract

We present evidence for a universal phase transition in transformer attention mechanisms that fundamentally alters their computational strategy based on input sequence length. Using Random Matrix Theory (RMT) baselines, spectral analysis, and information-theoretic measures, we demonstrate that transformers operate in two distinct regimes: a Local Interaction Regime for short sequences characterized by distributed pairwise processing, and a Collective Correlation Regime beyond a critical length $L_c$ featuring concentrated, non-local computation.

# Topological Reasoning in Transformers: Semantic Loop Analysis
# Implementation of "Beyond Reinforcement Learning" - Geometric Theory of Transformer Reasoning
"""
ABSTRACT:
We introduce a geometric theory of reasoning in Transformer models based on attention-induced
topological structures. This notebook demonstrates that reasoning emerges from closed, high-energy
attention loopssemantic circuits measurable through loop energy, holonomy, and attention geometry.
This topological reasoning model enables prompt design and evaluation without external reward policies.

Topological Reasoning in Transformers: Beyond Reinforcement Learning

Abstract

We introduce a geometric theory of reasoning in Transformer models based on attention-induced topological structures. Contrary to reinforcement learning-based paradigms that impose reasoning via reward optimization, we demonstrate that reasoning naturally emerges from closed, high-energy attention loops—semantic circuits measurable through loop energy, holonomy, and Ricci curvature. This topological reasoning model enables prompt design, evaluation, and model alignment without external reward policies.


1. Introduction

@DSamuelHodge
DSamuelHodge / thermodynamic_analyzer.py
Created May 18, 2025 13:49
Thermodynamics Analyzer: Analyzes language models through the lens of statistical thermodynamics. Calculates temperature (weight/gradient norm ratio), entropy (from singular values), energy (curvature), and derived metrics across layers. Identifies potential phase transitions using susceptibility (dG/dT), compressibility (dS/dF), and inter-laye…
import torch
import numpy as np
import pandas as pd
from scipy.linalg import svdvals
from transformers import AutoTokenizer, AutoModelForCausalLM
from transformers import logging as transformers_logging
import logging
# ──────────────────── CONFIGURATION PARAMETERS ────────────────────────────────
MODEL_NAME = "Qwen/Qwen2.5-0.5B"

Core AEG Components in the Code

  1. Complex Normalization Function
def normalize(self, z):
    rho = th.abs(z)
    theta = th.atan2(th.imag(z), th.real(z))
    return th.tanh(rho) * (th.cos(theta) + 1.0j * th.sin(theta))

This function maps complex values onto a curved manifold by:

@DSamuelHodge
DSamuelHodge / HTSR_Theory_Notion_Tables.md
Last active February 23, 2025 03:19
Understanding of Deep Learning through Heavy-Tailed Self-Regularization Theory by Charles Martin, PhD.

Acronyms and Notation Tables

Table 1: Definitions of acronyms used in HTSR

Acronym Description
DNN Deep Neural Network
ML Machine Learning
SGD Stochastic Gradient Descent
RMT Random Matrix Theory
@DSamuelHodge
DSamuelHodge / ww_advanced_usage.md
Last active February 17, 2025 21:37
WeightWatcher advanced features into logical categories to make it easier to find specific functionalities when you need them.

WeightWatcher Advanced Usage Cheatsheet

🔍 Basic Analysis

Feature Command Description
Analyze Model Layers watcher.analyze() Analyze model layers for generalization, spectral properties, and overtraining.
Describe Model watcher.describe(model=model) Get model details without analyzing it.
Plot and Fit ESD watcher.analyze(plot=True) Plot the Empirical Spectral Density (ESD) of model layers and apply fits.
Generate Summary Statistics summary = watcher.get_summary() Generate summary statistics from analysis results to compare models.
@DSamuelHodge
DSamuelHodge / deepseek-grpo-explainer.py
Created February 14, 2025 17:35
GRPO equation explained in Manim
from manim import *
class GRPOExplanation(MovingCameraScene):
def construct(self):
# Title
title = Text("DeepSeek-R1 Reinforcement Learning", font_size=36)
subtitle = Text("Group Relative Policy Optimization (GRPO)", font_size=24)
title_group = VGroup(title, subtitle).arrange(DOWN, buff=0.5)
title_group.to_edge(UP, buff=1)
@DSamuelHodge
DSamuelHodge / replace_attention_layers.py
Created November 6, 2024 03:52
Replaces standard LlamaAttention layers with Differential SPDA Attention layers in a Llama model.
import os
import math
from typing import List, Optional, Tuple, Union
import torch
import torch.nn.functional as F
from torch import nn
from transformers import AutoTokenizer, AutoModelForCausalLM, AutoConfig
from transformers.models.llama.modeling_llama import (