Skip to content

Instantly share code, notes, and snippets.

View gashon's full-sized avatar
👁️

Gashon Hussein gashon

👁️
View GitHub Profile
@gashon
gashon / long-transformers.md
Last active October 11, 2025 08:51
WIP: Long Context Transformers notes

Subquadratic Seqlen Scaling Notes

Repeated theme: use lighter "augmentation" of attention (e.g. clustering, dotprod, lsh lookup, etc) to vet only highly relevant tokens for expensive attention.

Transformer with Recurrence

Block Recurrent Transformer

https://arxiv.org/abs/2203.07852