bigsnarfdude/blog_structure.md

## blog_structure.md

      
    Raw
  

              blog_structure.md
            
          
    Mechanistic Interpretability Blog Post Template

A reusable structure for presenting mech interp experiments consistently.

1. Front Matter


Title: Descriptive, specific to the contribution
Authors & Affiliations
Date
Links: Code repo, demo, paper (if applicable)
TL;DR: 3-5 bullet points covering:

What problem you're addressing
Your approach (1-2 sentences)
Key findings
Limitations/caveats
Call to action or future directions


2. Introduction

2.1 Motivation


What phenomenon or capability are you trying to understand?
Why does this matter for AI safety/alignment/interpretability?

2.2 Gap in Existing Work


What has prior work done?
What's missing or insufficient?

2.3 Your Contribution


High-level description of your approach
What makes it novel or useful?

2.4 Scope & Setup


What models/tasks/datasets are you studying?
What are the boundaries of your investigation?


3. Method

3.1 Overview


Diagram or figure showing the full pipeline
Intuitive explanation before technical details

3.2 Step-by-Step Procedure

For each major component:

What: What does this step do?
Why: Why is this step necessary?
How: Technical details (can reference appendix for full rigor)

3.3 Key Design Choices


What alternatives did you consider?
Why did you choose this approach?

3.4 Failure Modes & Mitigations


What went wrong in early versions?
How did you address it?


4. Validation

4.1 Sanity Checks


Qualitative inspection: Does the output look reasonable?
Known-answer tests: Does it recover structure you intentionally created?
Edge cases: Does it fail gracefully?

4.2 Quantitative Evaluation


Task definition: What objective metric are you measuring?
Baselines: What are you comparing against?

Naive baselines (random, constant prediction)
Ablations of your method
Existing methods (if applicable)
Strong alternatives (e.g., just asking an LLM)


Results: Tables/figures with clear takeaways
Analysis by condition: Break down results by relevant factors

4.3 Interpretation of Results


What do the results tell us?
What are the limitations of the evaluation?


5. Qualitative Insights

5.1 Case Studies


Walk through specific examples in detail
Include visualizations where helpful

5.2 Patterns & Observations


What recurring themes emerged?
What surprised you?

5.3 Comparison Across Conditions


Different models
Different prompts/tasks
Different hyperparameters


6. Discussion

6.1 Summary of Findings


Restate key results in plain language

6.2 Limitations


What doesn't this method capture?
Where does it fail?
What assumptions does it make?

6.3 Implications


What does this tell us about how models work?
How might this inform safety/alignment work?

6.4 Future Directions


Concrete next steps
Open questions for the community
What would make this more useful?


7. Conclusion


2-3 paragraph wrap-up
Final call to action


8. Acknowledgements


Funding, mentorship, feedback

9. Contribution Statement


Who did what (for multi-author posts)


10. Appendix

A. Technical Details


Full algorithms/pseudocode
Hyperparameters and tuning
Prompts used (verbatim)

B. Extended Results


Ablation studies
Additional baselines
Per-task/per-model breakdowns

C. Reproducibility


Compute requirements
Data/model access
Known issues

D. Supplementary Figures


Additional visualizations
Extended examples


11. References/Footnotes


Style Guidelines


Element
Guideline


Figures
Every method section should have at least one diagram. Label clearly.


Code snippets
Use sparingly in main text; link to repo for full code.


Math
Define notation on first use. Keep inline math simple.


Length
Target 2,000-4,000 words for main text; appendix can be longer.


Tone
Honest about limitations. Avoid overclaiming.


Audience
Assume familiarity with ML but not your specific subfield.


Checklist Before Publishing


 TL;DR captures the essence in <1 minute of reading
 At least one sanity check shows the method isn't broken
 At least one quantitative baseline shows it's doing something non-trivial
 Limitations section is honest and specific
 Code/demo links work
 Figures render correctly
 A non-expert colleague can follow the main argument
Element	Guideline
Figures	Every method section should have at least one diagram. Label clearly.
Code snippets	Use sparingly in main text; link to repo for full code.
Math	Define notation on first use. Keep inline math simple.
Length	Target 2,000-4,000 words for main text; appendix can be longer.
Tone	Honest about limitations. Avoid overclaiming.
Audience	Assume familiarity with ML but not your specific subfield.
Section	Purpose	Time Investment
Abstract	Cold-start orientation, key claims, why it matters	~25%
Introduction	Extended abstract with context, citations, contribution summary	~25%
Figures	Communicate results visually; worth significant polish	~25%
Main Body	Full technical detail for skeptical readers	~25%
Related Work	Differentiate from prior work (often at end)	Lower priority
Discussion	Limitations, implications, future work	Important for integrity
Appendices	Everything else; lower standards, rarely read	Low effort