Skip to content

Instantly share code, notes, and snippets.

@bigsnarfdude
Created January 15, 2026 05:51
Show Gist options
  • Select an option

  • Save bigsnarfdude/3116c35f4e630904087972637767e108 to your computer and use it in GitHub Desktop.

Select an option

Save bigsnarfdude/3116c35f4e630904087972637767e108 to your computer and use it in GitHub Desktop.
blog_structure.md

Mechanistic Interpretability Blog Post Template

A reusable structure for presenting mech interp experiments consistently.


1. Front Matter

  • Title: Descriptive, specific to the contribution
  • Authors & Affiliations
  • Date
  • Links: Code repo, demo, paper (if applicable)
  • TL;DR: 3-5 bullet points covering:
    • What problem you're addressing
    • Your approach (1-2 sentences)
    • Key findings
    • Limitations/caveats
    • Call to action or future directions

2. Introduction

2.1 Motivation

  • What phenomenon or capability are you trying to understand?
  • Why does this matter for AI safety/alignment/interpretability?

2.2 Gap in Existing Work

  • What has prior work done?
  • What's missing or insufficient?

2.3 Your Contribution

  • High-level description of your approach
  • What makes it novel or useful?

2.4 Scope & Setup

  • What models/tasks/datasets are you studying?
  • What are the boundaries of your investigation?

3. Method

3.1 Overview

  • Diagram or figure showing the full pipeline
  • Intuitive explanation before technical details

3.2 Step-by-Step Procedure

For each major component:

  • What: What does this step do?
  • Why: Why is this step necessary?
  • How: Technical details (can reference appendix for full rigor)

3.3 Key Design Choices

  • What alternatives did you consider?
  • Why did you choose this approach?

3.4 Failure Modes & Mitigations

  • What went wrong in early versions?
  • How did you address it?

4. Validation

4.1 Sanity Checks

  • Qualitative inspection: Does the output look reasonable?
  • Known-answer tests: Does it recover structure you intentionally created?
  • Edge cases: Does it fail gracefully?

4.2 Quantitative Evaluation

  • Task definition: What objective metric are you measuring?
  • Baselines: What are you comparing against?
    • Naive baselines (random, constant prediction)
    • Ablations of your method
    • Existing methods (if applicable)
    • Strong alternatives (e.g., just asking an LLM)
  • Results: Tables/figures with clear takeaways
  • Analysis by condition: Break down results by relevant factors

4.3 Interpretation of Results

  • What do the results tell us?
  • What are the limitations of the evaluation?

5. Qualitative Insights

5.1 Case Studies

  • Walk through specific examples in detail
  • Include visualizations where helpful

5.2 Patterns & Observations

  • What recurring themes emerged?
  • What surprised you?

5.3 Comparison Across Conditions

  • Different models
  • Different prompts/tasks
  • Different hyperparameters

6. Discussion

6.1 Summary of Findings

  • Restate key results in plain language

6.2 Limitations

  • What doesn't this method capture?
  • Where does it fail?
  • What assumptions does it make?

6.3 Implications

  • What does this tell us about how models work?
  • How might this inform safety/alignment work?

6.4 Future Directions

  • Concrete next steps
  • Open questions for the community
  • What would make this more useful?

7. Conclusion

  • 2-3 paragraph wrap-up
  • Final call to action

8. Acknowledgements

  • Funding, mentorship, feedback

9. Contribution Statement

  • Who did what (for multi-author posts)

10. Appendix

A. Technical Details

  • Full algorithms/pseudocode
  • Hyperparameters and tuning
  • Prompts used (verbatim)

B. Extended Results

  • Ablation studies
  • Additional baselines
  • Per-task/per-model breakdowns

C. Reproducibility

  • Compute requirements
  • Data/model access
  • Known issues

D. Supplementary Figures

  • Additional visualizations
  • Extended examples

11. References/Footnotes


Style Guidelines

Element Guideline
Figures Every method section should have at least one diagram. Label clearly.
Code snippets Use sparingly in main text; link to repo for full code.
Math Define notation on first use. Keep inline math simple.
Length Target 2,000-4,000 words for main text; appendix can be longer.
Tone Honest about limitations. Avoid overclaiming.
Audience Assume familiarity with ML but not your specific subfield.

Checklist Before Publishing

  • TL;DR captures the essence in <1 minute of reading
  • At least one sanity check shows the method isn't broken
  • At least one quantitative baseline shows it's doing something non-trivial
  • Limitations section is honest and specific
  • Code/demo links work
  • Figures render correctly
  • A non-expert colleague can follow the main argument
@bigsnarfdude
Copy link
Author

@bigsnarfdude
Copy link
Author

How to Write ML Papers: A Distilled Guide

Based on Neel Nanda's "Highly Opinionated Advice on How to Write ML Papers"


The Core Idea

A paper is a narrative: 1–3 specific, novel claims supported by rigorous evidence, with a takeaway readers care about.

Your job: ensure readers understand, remember, and believe your narrative.


The Three Pillars

1. What? (Your Claims)

  • 1–3 concrete, novel claims under a cohesive theme
  • Adjust confidence to match evidence strength:
    • Existence proof: "We found at least one case where X happens"
    • Systematic: "X generally happens across contexts"
    • Hedged: "There is suggestive evidence that X"
    • Narrow: "X works for situations V & W with objective Y"

2. Why Should I Believe You? (Evidence)

  • Quality > quantity—one compelling experiment beats many mediocre ones
  • Red-team aggressively: assume you made a mistake, find it
  • Diverse evidence types are more robust than many similar experiments
  • Strong baselines matter—invest real effort making them good
  • Track pre-hoc vs post-hoc analysis; be honest about cherry-picking
  • Statistical rigor: p < .05 is too weak; aim for p < .001 for exploratory work

3. So What? (Motivation & Impact)

  • What problem are you solving?
  • How does it fit the bigger picture?
  • Why does your takeaway matter?

Paper Structure

Section Purpose Time Investment
Abstract Cold-start orientation, key claims, why it matters ~25%
Introduction Extended abstract with context, citations, contribution summary ~25%
Figures Communicate results visually; worth significant polish ~25%
Main Body Full technical detail for skeptical readers ~25%
Related Work Differentiate from prior work (often at end) Lower priority
Discussion Limitations, implications, future work Important for integrity
Appendices Everything else; lower standards, rarely read Low effort

Red-Teaming Your Evidence

Ask yourself:

  • How surprised would I be if this result was wrong due to a bug?
  • If I ran this again, would results be consistent?
  • Could the evidence be true but the claim false?
  • What alternative explanations am I missing?
  • Did I invest as much effort in baselines as my main method?

Novelty

Novelty = expanding our knowledge, not just "shiny and new."

  • Be explicit about what is/isn't novel
  • Cite liberally and explain differences from prior work
  • Legitimate criticism of prior work is fine—stay professional

Final Checklist

  • 1–3 concrete claims with cohesive theme
  • Strong evidence for each claim
  • Clear motivation and impact
  • Abstract that stands alone
  • Introduction that's a complete summary
  • Polished figures with good captions
  • Limitations acknowledged
  • Key terms defined
  • Reproducibility info (code, hyperparameters)
  • Feedback from others at each stage

"Your research only matters if people read, understand, and build upon it."

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment