donbr/github-repo-deep-research-prompt.md

## github-repo-deep-research-prompt.md

      
    Raw
  

              github-repo-deep-research-prompt.md
            
          
    GitHub Repository Research Prompt

You are analyzing a GitHub repository as a software architect and systems researcher.
Critical rules

Do not assume technologies or patterns based on the repo name, description, or my prior comments.

Instead, infer everything from:

README*, docs/, pyproject.toml / package.json, Dockerfile*, compose*, etc.


Separate facts from inferences:

Mark things as:

[Grounded] – directly supported by files you’ve inspected.
[Inference] – reasoned from patterns, but not explicitly stated.


If a part of this prompt refers to a technology (e.g., “Temporal”, “Logfire”, “MCP”, “GraphRAG”) that you do not find in the repo:

Treat that as a hypothesis to test, not as a fact.
Say explicitly: “No evidence of X found in the repo.”


You may request additional files / paths if needed.

Step 0 — Initial Context Gathering


Read:

Root README*
pyproject.toml / package.json / requirements.txt / Pipfile / Cargo.toml, etc.
Any top-level docs/ or examples/ directory.


From these, build a high-level summary:

Problem/domain the repo targets.
Intended users (library authors, app devs, data scientists, operators, etc.).
Claimed key features.


Mark each bullet as [Grounded] or [Inference].

Step 1 — Repository Map

Create a structured map of the repo:

List all top-level directories and key files and describe their purpose:

e.g., api/, frontend/, examples/, infra/, scripts/, etc.


For each major subproject or example:

What does it appear to do?
How is it invoked (CLI, HTTP API, library import, etc.)?


Flag where you are inferring behavior from naming vs. confirmed usage in code.

Step 2 — Stack & Dependencies

From configuration and code:

Identify the primary languages and frameworks used:

e.g., Python + FastAPI, TypeScript + React, Rust + Axum, etc.


Identify major runtime dependencies, such as:

Web frameworks
Agent/orchestration frameworks
Observability / tracing libraries
Databases / message queues
LLM / AI providers


For each major dependency, answer:

Where in the code is it used?
What role does it play (HTTP server, agent runtime, storage, tracing, etc.)?

If the repo mentions a technology by name but you don’t see it in code, call that out explicitly.

Step 3 — Architectural Deep Dive

For each significant component (service, example app, module):


Describe its responsibility and entrypoints:

HTTP endpoints
CLI commands
Library APIs
Background workers / schedulers


For at least one representative flow per component, document the data path:

Inputs → validation → processing → external calls (DB, LLM, APIs) → outputs.
Where, if anywhere, types or schemas enforce constraints.


Build a conceptual architecture diagram (textual):

Components as nodes
Edges as calls (HTTP, gRPC, DB queries, message bus, function calls)
Label any external systems (DBs, queues, LLMs).


Keep a clear distinction between:

What is explicitly present in the code.
What you infer from conventions.


Step 4 — Runtime & Operational Model

Analyze the repo for runtime/ops concerns:


Configuration & secrets

How are things configured? (env vars, config files, flags)
Any clear patterns for environment separation (dev/stage/prod)?


State & persistence

What persistence layers exist (files, relational DB, KV store, etc.)?
How does the code handle migrations or schema definition, if at all?


Durability / execution model

Do you see explicit integration with:

Workflow engines (Temporal, DBOS, Airflow, etc.)?
Message queues or schedulers?


If yes:

How are workflows / jobs defined?
How is state recovered after failure?


If no:

Describe the default failure behavior (what happens if a process crashes mid-flow?).


Only mention specific technologies (e.g., “Temporal”) if they show up in code or config.

Step 5 — Integration Points (APIs, Tools, External Systems)

Identify all external-facing interfaces:


APIs:

HTTP endpoints and their payload shapes.
Any OpenAPI / schema docs.


Tooling / protocol integration (if present):

MCP, gRPC, WebSocket, CLI, plugin systems, etc.
For each, summarize how the repo defines and consumes these interfaces.


LLM / AI Integration (if present):

Which providers / models?
How requests and responses are structured.
Any validation or safety-related controls.


If a concept like “MCP” or “GraphRAG” does not appear in the codebase, explicitly say so.

Step 6 — Design Patterns, Anti-Patterns, Extensibility

From the code and structure:


Identify recurring design patterns, such as:

Dependency injection
Layered architecture
Ports-and-adapters / hexagonal boundaries
Event-driven or workflow patterns
Strong typing & validation boundaries


Identify potential anti-patterns or risks:

Tight coupling between layers
Inconsistent error handling
Mixed concerns (business logic in handlers, etc.)
Weak separation between core logic and I/O


Evaluate extensibility:

Where would you add:

New endpoints
New agents / workflows
New storage backends


How painful would that be, based on the current design?


Back each point with code references where possible.

Step 7 — Application to My Context (Optional Layer)

Given everything above, propose:


How this repo’s patterns could be integrated into:

Agent-based systems
RAG / knowledge-graph systems
Multi-service architectures with observability


Concrete extension ideas:

Where to plug in additional retrieval or graph layers
How to add more robust observability or evaluation pipelines


Tie recommendations to actual extension points in the code (files, classes, functions).

Output Format

Produce your analysis with these sections:

Executive Summary (1–2 paragraphs)
Repository Map
Stack & Dependencies
Architectural Deep Dive
Runtime & Operational Model
Integration Points
Design Patterns / Anti-Patterns / Extensibility
Recommendations for Integration / Extension
Open Questions & Uncertainties (things the repo doesn’t clearly answer)

For each major claim, note [Grounded] or [Inference].
No results found