You are analyzing a GitHub repository as a software architect and systems researcher.
Critical rules
- Do not assume technologies or patterns based on the repo name, description, or my prior comments.
- Instead, infer everything from:
README*,docs/,pyproject.toml/package.json,Dockerfile*,compose*, etc.
- Instead, infer everything from:
- Separate facts from inferences:
- Mark things as:
- [Grounded] – directly supported by files you’ve inspected.
- [Inference] – reasoned from patterns, but not explicitly stated.
- Mark things as:
- If a part of this prompt refers to a technology (e.g., “Temporal”, “Logfire”, “MCP”, “GraphRAG”) that you do not find in the repo:
- Treat that as a hypothesis to test, not as a fact.
- Say explicitly: “No evidence of X found in the repo.”
You may request additional files / paths if needed.
-
Read:
- Root
README* pyproject.toml/package.json/requirements.txt/Pipfile/Cargo.toml, etc.- Any top-level
docs/orexamples/directory.
- Root
-
From these, build a high-level summary:
- Problem/domain the repo targets.
- Intended users (library authors, app devs, data scientists, operators, etc.).
- Claimed key features.
Mark each bullet as [Grounded] or [Inference].
Create a structured map of the repo:
- List all top-level directories and key files and describe their purpose:
- e.g.,
api/,frontend/,examples/,infra/,scripts/, etc.
- e.g.,
- For each major subproject or example:
- What does it appear to do?
- How is it invoked (CLI, HTTP API, library import, etc.)?
Flag where you are inferring behavior from naming vs. confirmed usage in code.
From configuration and code:
- Identify the primary languages and frameworks used:
- e.g., Python + FastAPI, TypeScript + React, Rust + Axum, etc.
- Identify major runtime dependencies, such as:
- Web frameworks
- Agent/orchestration frameworks
- Observability / tracing libraries
- Databases / message queues
- LLM / AI providers
For each major dependency, answer:
- Where in the code is it used?
- What role does it play (HTTP server, agent runtime, storage, tracing, etc.)?
If the repo mentions a technology by name but you don’t see it in code, call that out explicitly.
For each significant component (service, example app, module):
-
Describe its responsibility and entrypoints:
- HTTP endpoints
- CLI commands
- Library APIs
- Background workers / schedulers
-
For at least one representative flow per component, document the data path:
- Inputs → validation → processing → external calls (DB, LLM, APIs) → outputs.
- Where, if anywhere, types or schemas enforce constraints.
-
Build a conceptual architecture diagram (textual):
- Components as nodes
- Edges as calls (HTTP, gRPC, DB queries, message bus, function calls)
- Label any external systems (DBs, queues, LLMs).
Keep a clear distinction between:
- What is explicitly present in the code.
- What you infer from conventions.
Analyze the repo for runtime/ops concerns:
-
Configuration & secrets
- How are things configured? (env vars, config files, flags)
- Any clear patterns for environment separation (dev/stage/prod)?
-
State & persistence
- What persistence layers exist (files, relational DB, KV store, etc.)?
- How does the code handle migrations or schema definition, if at all?
-
Durability / execution model
- Do you see explicit integration with:
- Workflow engines (Temporal, DBOS, Airflow, etc.)?
- Message queues or schedulers?
- If yes:
- How are workflows / jobs defined?
- How is state recovered after failure?
- If no:
- Describe the default failure behavior (what happens if a process crashes mid-flow?).
- Do you see explicit integration with:
Only mention specific technologies (e.g., “Temporal”) if they show up in code or config.
Identify all external-facing interfaces:
-
APIs:
- HTTP endpoints and their payload shapes.
- Any OpenAPI / schema docs.
-
Tooling / protocol integration (if present):
- MCP, gRPC, WebSocket, CLI, plugin systems, etc.
- For each, summarize how the repo defines and consumes these interfaces.
-
LLM / AI Integration (if present):
- Which providers / models?
- How requests and responses are structured.
- Any validation or safety-related controls.
If a concept like “MCP” or “GraphRAG” does not appear in the codebase, explicitly say so.
From the code and structure:
-
Identify recurring design patterns, such as:
- Dependency injection
- Layered architecture
- Ports-and-adapters / hexagonal boundaries
- Event-driven or workflow patterns
- Strong typing & validation boundaries
-
Identify potential anti-patterns or risks:
- Tight coupling between layers
- Inconsistent error handling
- Mixed concerns (business logic in handlers, etc.)
- Weak separation between core logic and I/O
-
Evaluate extensibility:
- Where would you add:
- New endpoints
- New agents / workflows
- New storage backends
- How painful would that be, based on the current design?
- Where would you add:
Back each point with code references where possible.
Given everything above, propose:
-
How this repo’s patterns could be integrated into:
- Agent-based systems
- RAG / knowledge-graph systems
- Multi-service architectures with observability
-
Concrete extension ideas:
- Where to plug in additional retrieval or graph layers
- How to add more robust observability or evaluation pipelines
Tie recommendations to actual extension points in the code (files, classes, functions).
Produce your analysis with these sections:
- Executive Summary (1–2 paragraphs)
- Repository Map
- Stack & Dependencies
- Architectural Deep Dive
- Runtime & Operational Model
- Integration Points
- Design Patterns / Anti-Patterns / Extensibility
- Recommendations for Integration / Extension
- Open Questions & Uncertainties (things the repo doesn’t clearly answer)
For each major claim, note [Grounded] or [Inference].