Gerenuk

## mnist_pairs.R
library(tidyverse)

# Data is downloaded from here:
# https://www.kaggle.com/c/digit-recognizer
kaggle_data <- read_csv("~/Downloads/train.csv")

pixels_gathered <- kaggle_data %>%
  mutate(instance = row_number()) %>%
  gather(pixel, value, -label, -instance) %>%
  extract(pixel, "pixel", "(\\d+)", convert = TRUE)

## human-readable-hash-comparisons.md

      
              1 file
            
          
              2 forks
            
          
                1 comment
              
            
              22 stars
            
          
                raineorshine
                / human-readable-hash-comparisons.md
            
            
              Last active
              July 21, 2024 20:31
            
              
                An aesthetic comparison of a few human-readable hashing functions.
              
          
    An Aesthetic Comparison of Human-Readable 
Hashing Functions

The following compares the output of several creative hash functions designed for human readability.
sha1's are merely used as arbitrary, longer, distributed input values.
zacharyvoase/humanhash


input
1 word output
2 word output
3 word output


## TDA_resources.md

      
              1 file
            
          
              41 forks
            
          
                1 comment
              
            
              183 stars
            
          
                calstad
                / TDA_resources.md
            
            
              Last active
              September 29, 2025 12:57
            
              
                List of resources for TDA
              
          
    Quick List of Resources for Topological Data Analysis with Emphasis on Machine Learning

This is just a quick list of resourses on TDA that I put together for @rickasaurus after he was asking for links to papers, books, etc on Twitter and is by no means an exhaustive list.
Survey Papers

Both Carlsson's and Ghrist's survey papers offer a very good introduction to the subject

Topology and Data by Gunnar Carlsson
Barcodes: The Persistent Topology of Data by Robert Ghrist

Other Papers and Web Resources


Extracting insights from the shape of complex data using topology A good introductory paper in Nature on the Mapper algorithm.


## multiclass_svm.py
"""
Multiclass SVMs (Crammer-Singer formulation).

A pure Python re-implementation of:

Large-scale Multiclass Support Vector Machine Training via Euclidean Projection onto the Simplex.
Mathieu Blondel, Akinori Fujino, and Naonori Ueda.
ICPR 2014.
http://www.mblondel.org/publications/mblondel-icpr2014.pdf
"""

## wordlist.txt
   Wordlist ver 0.732 - EXPECT INCOMPATIBLE CHANGES;
  acrobat  africa   alaska   albert   albino   album
  alcohol  alex     alpha    amadeus  amanda   amazon
  america  analog   animal   antenna  antonio  apollo
  april    aroma    artist   aspirin  athlete  atlas
  banana   bandit   banjo    bikini   bingo    bonus
  camera   canada   carbon   casino   catalog  cinema
  citizen  cobra    comet    compact  complex  context
  credit   critic   crystal  culture  david    delta
  dialog   diploma  doctor   domino   dragon   drama

## nmf_kl.py
""" Non-negative matrix factorization for I divergence

    This code was implements Lee and Seung's multiplicative updates algorithm
    for NMF with I divergence cost.

    Lee D. D., Seung H. S., Learning the parts of objects by non-negative
      matrix factorization. Nature, 1999
"""
# Author: Olivier Mangin <olivier.mangin@inria.fr>

## git_bible.md

      
              1 file
            
          
              14 forks
            
          
                4 comments
              
            
              99 stars
            
          
                dmglab
                / git_bible.md
            
            
              Last active
              March 9, 2024 02:59
            
              
                how to git
              
          
Note: this is a summary of different git workflows putting together to a small git bible.
references are in between the text

How to Branch

try to keep your hacking out of the master and create feature branches.
the [feature-branch workflow][4] is a good median between noobs (i have no idea how to branch) and
git veterans (let's do some rocket sience with git branches!). everybody get the idea!
Basic usage examples


## gist:8172796

      
              1 file
            
          
              405 forks
            
          
                23 comments
              
            
              1660 stars
            
          
                debasishg
                / gist:8172796
            
            
              Last active
              February 24, 2026 02:03
            
              
                A collection of links for streaming algorithms and data structures
              
          
    General Background and Overview


Probabilistic Data Structures for Web Analytics and Data Mining : A great overview of the space of probabilistic data structures and how they are used in approximation algorithm implementation.
Models and Issues in Data Stream Systems
Philippe Flajolet’s contribution to streaming algorithms : A presentation by Jérémie Lumbroso that visits some of the hostorical perspectives and how it all began with Flajolet
Approximate Frequency Counts over Data Streams by Gurmeet Singh Manku & Rajeev Motwani : One of the early papers on the subject.
[Methods for Finding Frequent Items in Data Streams](http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.187.9800&amp;rep=rep1&amp;t


## latency.txt
Latency Comparison Numbers (~2012)
----------------------------------
L1 cache reference                           0.5 ns
Branch mispredict                            5   ns
L2 cache reference                           7   ns                      14x L1 cache
Mutex lock/unlock                           25   ns
Main memory reference                      100   ns                      20x L2 cache, 200x L1 cache
Compress 1K bytes with Zippy             3,000   ns        3 us
Send 1K bytes over 1 Gbps network       10,000   ns       10 us
Read 4K randomly from SSD*             150,000   ns      150 us          ~1GB/sec SSD

## nngarotte.py
"""
Non-Negative Garotte implementation with the scikit-learn
"""

# Author: Alexandre Gramfort <alexandre.gramfort@inria.fr>
#         Jaques Grobler (__main__ script) <jaques.grobler@inria.fr>
#
# License: BSD Style.

import numpy as np
	library(tidyverse)

	# Data is downloaded from here:
	# https://www.kaggle.com/c/digit-recognizer
	kaggle_data <- read_csv("~/Downloads/train.csv")

	pixels_gathered <- kaggle_data %>%
	mutate(instance = row_number()) %>%
	gather(pixel, value, -label, -instance) %>%
	extract(pixel, "pixel", "(\\d+)", convert = TRUE)
	"""
	Multiclass SVMs (Crammer-Singer formulation).

	A pure Python re-implementation of:

	Large-scale Multiclass Support Vector Machine Training via Euclidean Projection onto the Simplex.
	Mathieu Blondel, Akinori Fujino, and Naonori Ueda.
	ICPR 2014.
	http://www.mblondel.org/publications/mblondel-icpr2014.pdf
	"""
	Wordlist ver 0.732 - EXPECT INCOMPATIBLE CHANGES;
	acrobat africa alaska albert albino album
	alcohol alex alpha amadeus amanda amazon
	america analog animal antenna antonio apollo
	april aroma artist aspirin athlete atlas
	banana bandit banjo bikini bingo bonus
	camera canada carbon casino catalog cinema
	citizen cobra comet compact complex context
	credit critic crystal culture david delta
	dialog diploma doctor domino dragon drama
	""" Non-negative matrix factorization for I divergence

	This code was implements Lee and Seung's multiplicative updates algorithm
	for NMF with I divergence cost.

	Lee D. D., Seung H. S., Learning the parts of objects by non-negative
	matrix factorization. Nature, 1999
	"""
	# Author: Olivier Mangin <olivier.mangin@inria.fr>
	Latency Comparison Numbers (~2012)
	----------------------------------
	L1 cache reference 0.5 ns
	Branch mispredict 5 ns
	L2 cache reference 7 ns 14x L1 cache
	Mutex lock/unlock 25 ns
	Main memory reference 100 ns 20x L2 cache, 200x L1 cache
	Compress 1K bytes with Zippy 3,000 ns 3 us
	Send 1K bytes over 1 Gbps network 10,000 ns 10 us
	Read 4K randomly from SSD* 150,000 ns 150 us ~1GB/sec SSD
	"""
	Non-Negative Garotte implementation with the scikit-learn
	"""

	# Author: Alexandre Gramfort <alexandre.gramfort@inria.fr>
	# Jaques Grobler (__main__ script) <jaques.grobler@inria.fr>
	#
	# License: BSD Style.

	import numpy as np