cheeseonamonkey/_output.md

## _output.md

      
    Raw
  

              _output.md
            
          
    Python code scout: metrics + locally-generated summaries.
It’s useful for AI-agent priming because it extracts code into a dense signal - enough context for an agent to start acting like it “read the code.”
The summaries are actually quite lit; runs locally on your CPU (the model is <200MB) and is good at explaining what the code does, not just what it’s called.


Scout on itself compresses 5,123 source tokens into 1,628.
In another larger project: 22,962 into just 6,372! The agent could describe the whole thing technically, predict bugs, and suggest refactors with no additional context from source.


❯ python scout.py --help

usage: scout [-h] [--columns COLUMNS] [--list-columns] [--exclude-dirs EXCLUDE_DIRS] [--ai] [--json] [--mypy] [--jobs JOBS] [--cache CACHE] [--no-cache] [-q] [path]

Static analysis for AI agents

positional arguments:
  path                  Path to analyze

options:
  -h, --help            show this help message and exit
  --columns COLUMNS     Columns to display (default: name,lines,summary,h_vol,calls,h_bugs,cc,mi)
  --list-columns        List all possible columns
  --exclude-dirs EXCLUDE_DIRS
                        Extra dirs to exclude (comma-sep)
  --ai                  AI summaries per symbol
  --json                JSON output
  --mypy                Run mypy
  --jobs JOBS           Parallel workers
  --cache CACHE         Cache file
  --no-cache            Disable cache
  -q, --quiet           Suppress progress

examples:
  python scout.py --ai --columns "name,lines,summary,h_vol,calls,h_bugs,cc,mi" scout.py     # default columns
Output example:


Output example
❯ python scout.py  \
        --ai          \
        --columns "name,lines,summary,h_vol,h_bugs,cc,mi,calls"      \
        scout.py


Loading AI model...
╭──────────────────────────────────────────────────────────────────────────────────────────────────────────────── Project ─────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ Files: 1  Symbols: 34  LOC: 483 (374 source)                                                                                                                                                                                             │
╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
NAME                                                                                              LINES    SUMMARY                                                                                         H_VOL   H_BUGS  CC  MI      CALLS
metric(name: str, desc: str='')                                                                   25-30    Decorator to register a metric function as a new metric .                                       0       0.00    1   100.00  3
metrics_for_columns(columns: List) -> List                                                        32-40    Return metric names needed to produce requested columns .                                       4.80    0.00    4   73.90   4
_safe(default, fn)                                                                                42-44    Call a function and return the result if it fails .                                             0       0.00    2   100.00  1
_raw(code: str)                                                                                   46-47    Return the raw code .                                                                           0       0.00    1   100.00  2
m_loc(code: str, _)                                                                               50-50    Return the number of lines in the code .                                                        0       0.00    1   100.00  3
m_sloc(code: str, _)                                                                              53-55    Return the sloc of the code .                                                                   0       0.00    2   100.00  2
m_comments(code: str, _)                                                                          58-61    Return a dict of comments and the ratio of the comments .                                       11.60   0.00    2   75.30   4
m_blank(code: str, _)                                                                             64-66    Return the blank value of the node .                                                            0       0.00    2   100.00  2
m_cc(code: str, _)                                                                                69-71    Return the complexity of the code .                                                             0       0.00    2   100.00  3
m_mi(code: str, _)                                                                                74-75    Return the mi value of the given code .                                                         0       0.00    1   100.00  5
m_halstead(code: str, _)                                                                          78-83    Return the Halstead total of a sequence of words .                                              2.00    0.00    2   77.92   8
m_sig(code: str, node)                                                                            86-99    Generate a signature for a function .                                                           124.90  0.04    10  56.59   13
m_params(code: str, node)                                                                         102-104  Return the number of parameters in the function or async function .                             11.60   0.00    3   76.89   3
m_nesting(code: str, node)                                                                        107-114  Return a dict of the nesting level of the given node .                                          22.50   0.01    2   67.55   6
m_calls(code: str, node)                                                                          117-118  Returns the number of calls in the given code .                                                 0       0.00    3   100.00  4
m_branches(code: str, node)                                                                       121-122  Return the number of branches in the code .                                                     0       0.00    3   100.00  4
m_loops(code: str, node)                                                                          125-126  Return a dictionary of loops .                                                                  0       0.00    3   100.00  4
m_doc(code: str, node)                                                                            129-132  Return a dict with the number of lines and comments in the node s docstring .                   2.00    0.00    3   79.05   7
load_ai(quiet=False)                                                                              162-177  Load the AI model and return a function that can be used to summarize the sequence of tokens .  2.00    0.00    2   87.35   9
run_mypy(path: Path) -> Dict                                                                      182-191  Runs mypy and returns a dict of the number of errors warnings and issues .                      57.40   0.02    7   64.03   6
_sha1(b: bytes) -> str                                                                            198-198  Return the SHA - 1 hash of a bytes object .                                                     0       0.00    1   100.00  2
load_cache(p: Path) -> Dict                                                                       200-205  Load the cache from a JSON file .                                                               11.60   0.00    4   71.19   4
save_cache(p: Path, d: Dict)                                                                      207-209  Save a dictionary to a JSON file .                                                              0       0.00    2   100.00  2
should_skip(p: Path, excl: Set) -> bool                                                           214-215  Returns True if p should be skipped .                                                           4.80    0.00    2   88.42   1
extract_imports(tree: ast.AST) -> List                                                            217-225  Extract imports from an ast . AST .                                                             4.80    0.00    8   71.47   7
mod_to_path(root: Path, mod: str) -> Optional[Path]                                               227-231  Return the path of the module in root if it exists .                                            23.30   0.01    3   73.05   3
build_import_graph(root: Path, file_imports: Dict[str, List], files: Set) -> Dict[str, List]      233-240  Build a graph of import paths to files .                                                        48.10   0.02    9   65.20   11
analyze_file(path: str, columns: List) -> Dict                                                    245-269  Analyze a file and return a dict of metrics .                                                   66.40   0.02    12  55.13   27
get_all_columns() -> List                                                                         274-280  Returns a list of all possible column names .                                                   13.90   0.01    2   73.29   2
format_output(console: Console, symbols: List[Symbol], file_stats: List[FileStats], columns:      282-319  Formats the output of                                                                           99.90   0.03    20  66.67   26
List, show_summary: bool)                                                                                  FailureSummary .
format_json(symbols: List[Symbol], file_stats: List[FileStats], import_graph: Dict)               321-331  Formats a list of symbols and file stats into a JSON file .                                     0       0.00    5   100.00  6
main()                                                                                            339-480  Entry point for the command line tool .                                                         297.50  0.10    55  47.10   100
Symbol                                                                                            138-145  A basic syntax tree for a sequence of tokens .                                                  0       0.00    1   100.00  1
FileStats                                                                                         148-157  Statistics for a n - language language grammar .                                                0       0.00    1   100.00  1


Columns / metrics (compact reference)


Columns / metrics (compact reference)
Each row is a column name you can request via --columns. “Source” indicates where the value comes from.


Column(s)
Meaning
Source


file, name, kind, lines
Symbol identity + file + line span
Radon visitor metadata


summary
Short AI description per symbol
CodeT5 summarizer (--ai)


fan_in, fan_out
File dependency counts (in/out degree)
AST import parse → best-effort local import graph


loc
Total lines in snippet
splitlines()


sloc, blank, comments, comment_ratio
Raw code stats
radon.raw.analyze (+ ratio derived)


cc
Cyclomatic complexity
radon.complexity.cc_visit


mi
Maintainability index (0–100)
radon.metrics.mi_visit


h_vol, h_diff, h_effort, h_time, h_bugs
Halstead totals
radon.metrics.h_visit(...).total.*


sig, params
Function signature + param count
AST (ast.unparse, node.args)


nesting
Max nesting depth
AST walk (if/for/while/with/try)


calls, branches, loops
Call/branch/loop counts
AST walk (Call / If / For+While)


has_doc, doc_lines
Docstring presence + line count
AST (ast.get_docstring)


scout.py --list-columns prints everything supported by your build.


## scout.py
#!/usr/bin/env python3
"""scout.py — static analysis for AI agents (streamlined)"""

from __future__ import annotations
import argparse, ast, hashlib, json, os, subprocess, sys
from concurrent.futures import ProcessPoolExecutor, as_completed
from dataclasses import dataclass, field
from pathlib import Path
from typing import Any, Callable, Dict, List, Optional, Set

import radon.complexity as rcc
import radon.metrics as rm
import radon.raw as rr
from radon.visitors import Class as RC, Function as RF
from rich.console import Console
from rich.table import Table
from rich.panel import Panel

# ─────────────────────────────────────────────────────────────
# Metrics (computed on-demand based on --columns)
# ─────────────────────────────────────────────────────────────
MetricFn = Callable[[str, Optional[ast.AST]], Dict[str, Any]]
METRICS: Dict[str, tuple[MetricFn, str, List[str]]] = {}  # name -> (fn, desc, output_keys)

def metric(name: str, desc: str = ""):
    def wrap(fn: MetricFn):
        keys = list(fn("", None).keys())
        METRICS[name] = (fn, desc, keys)
        return fn
    return wrap

def metrics_for_columns(columns: List[str]) -> List[str]:
    """Return metric names needed to produce requested columns."""
    needed = set()
    for col in columns:
        for name, (_, _, keys) in METRICS.items():
            if col in keys:
                needed.add(name)
                break
    return list(needed)

def _safe(default, fn):
    try: return fn()
    except Exception: return default

def _raw(code: str):
    return _safe(None, lambda: rr.analyze(code))

@metric("loc", "Lines of code")
def m_loc(code: str, _): return {"loc": len(code.splitlines())}

@metric("sloc", "Source lines")
def m_sloc(code: str, _):
    a = _raw(code)
    return {"sloc": a.sloc if a else 0}

@metric("comments", "Comment lines + ratio")
def m_comments(code: str, _):
    a = _raw(code)
    if not a: return {"comments": 0, "comment_ratio": 0.0}
    return {"comments": a.comments, "comment_ratio": round(a.comments / max(a.sloc, 1), 2)}

@metric("blank", "Blank lines")
def m_blank(code: str, _):
    a = _raw(code)
    return {"blank": a.blank if a else 0}

@metric("cc", "Cyclomatic complexity")
def m_cc(code: str, _):
    blocks = _safe([], lambda: rcc.cc_visit(code))
    return {"cc": blocks[0].complexity if blocks else 0}

@metric("mi", "Maintainability index (0-100)")
def m_mi(code: str, _):
    return {"mi": round(float(_safe(0.0, lambda: rm.mi_visit(code, multi=True))), 2)}

@metric("halstead", "Halstead metrics")
def m_halstead(code: str, _):
    h = _safe(None, lambda: rm.h_visit(code))
    if not h: return {"h_vol": 0, "h_diff": 0, "h_effort": 0, "h_time": 0, "h_bugs": 0}
    t = h.total
    return {"h_vol": round(t.volume, 1), "h_diff": round(t.difficulty, 1),
            "h_effort": round(t.effort, 1), "h_time": round(t.time, 1), "h_bugs": round(t.bugs, 3)}

@metric("sig", "Function signature")
def m_sig(code: str, node):
    if not isinstance(node, (ast.FunctionDef, ast.AsyncFunctionDef)): return {"sig": ""}
    def u(x): return _safe("", lambda: ast.unparse(x))
    parts = []
    for a in (node.args.args or []):
        s = a.arg + (f": {u(a.annotation)}" if a.annotation else "")
        parts.append(s)
    defs = node.args.defaults or []
    if defs:
        base = len(parts) - len(defs)
        for i, d in enumerate(defs):
            if 0 <= base + i < len(parts): parts[base + i] += f"={u(d)}"
    ret = f" -> {u(node.returns)}" if node.returns else ""
    return {"sig": f"({', '.join(parts)}){ret}"}

@metric("params", "Parameter count")
def m_params(code: str, node):
    if not isinstance(node, (ast.FunctionDef, ast.AsyncFunctionDef)): return {"params": 0}
    return {"params": len(node.args.args or [])}

@metric("nesting", "Max nesting depth")
def m_nesting(code: str, node):
    if not node: return {"nesting": 0}
    inc = (ast.For, ast.While, ast.If, ast.With, ast.Try)
    def depth(n, lvl=0):
        if isinstance(n, inc): lvl += 1
        kids = [depth(c, lvl) for c in ast.iter_child_nodes(n)]
        return max([lvl] + kids) if kids else lvl
    return {"nesting": depth(node)}

@metric("calls", "Function calls")
def m_calls(code: str, node):
    return {"calls": sum(isinstance(n, ast.Call) for n in ast.walk(node)) if node else 0}

@metric("branches", "if/elif branches")
def m_branches(code: str, node):
    return {"branches": sum(isinstance(n, ast.If) for n in ast.walk(node)) if node else 0}

@metric("loops", "for/while loops")
def m_loops(code: str, node):
    return {"loops": sum(isinstance(n, (ast.For, ast.While)) for n in ast.walk(node)) if node else 0}

@metric("doc", "Docstring info")
def m_doc(code: str, node):
    if not node: return {"has_doc": 0, "doc_lines": 0}
    doc = _safe(None, lambda: ast.get_docstring(node))
    return {"has_doc": int(bool(doc)), "doc_lines": len(doc.splitlines()) if doc else 0}

# ─────────────────────────────────────────────────────────────
# Data model
# ─────────────────────────────────────────────────────────────
@dataclass
class Symbol:
    name: str
    kind: str  # function|class
    file: str
    start: int
    end: int
    metrics: Dict[str, Any] = field(default_factory=dict)
    summary: Optional[str] = None

@dataclass
class FileStats:
    file: str
    symbols: int = 0
    loc: int = 0
    sloc: int = 0
    comments: int = 0
    blank: int = 0
    imports: List[str] = field(default_factory=list)
    fan_in: int = 0
    fan_out: int = 0

# ─────────────────────────────────────────────────────────────
# AI summarization (CodeT5 small - very fast)
# ─────────────────────────────────────────────────────────────
def load_ai(quiet=False):
    from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
    import torch

    model_id = "Salesforce/codet5-base-multi-sum"  # ~220MB, optimized for summarization
    tok = AutoTokenizer.from_pretrained(model_id)
    model = AutoModelForSeq2SeqLM.from_pretrained(model_id)
    model.eval()
    console = Console(stderr=True) if not quiet else None

    def summarize(code: str) -> str:
        inputs = tok(code[:1024], return_tensors="pt", truncation=True, max_length=512)
        with torch.inference_mode():
            out = model.generate(**inputs, max_new_tokens=48)
        return tok.decode(out[0], skip_special_tokens=True).strip()
    return summarize, console

# ─────────────────────────────────────────────────────────────
# External tools (mypy only)
# ─────────────────────────────────────────────────────────────
def run_mypy(path: Path) -> Dict[str, Any]:
    try:
        r = subprocess.run(["mypy", str(path), "--no-error-summary"],
                           capture_output=True, text=True, timeout=60)
        lines = (r.stdout or "").strip().splitlines()
        errors = [l for l in lines if ": error:" in l]
        warnings = [l for l in lines if ": warning:" in l]
        return {"errors": len(errors), "warnings": len(warnings), "issues": errors + warnings}
    except (subprocess.TimeoutExpired, FileNotFoundError):
        return {"errors": -1, "warnings": -1, "issues": []}

# ─────────────────────────────────────────────────────────────
# Cache
# ─────────────────────────────────────────────────────────────
CACHE_VER = 3

def _sha1(b: bytes) -> str: return hashlib.sha1(b).hexdigest()

def load_cache(p: Path) -> Dict:
    if not p.exists(): return {"v": CACHE_VER, "files": {}, "ai": {}}
    try:
        d = json.loads(p.read_text())
        return d if d.get("v") == CACHE_VER else {"v": CACHE_VER, "files": {}, "ai": {}}
    except: return {"v": CACHE_VER, "files": {}, "ai": {}}

def save_cache(p: Path, d: Dict):
    try: p.write_text(json.dumps(d))
    except: pass

# ─────────────────────────────────────────────────────────────
# Imports + graphs
# ─────────────────────────────────────────────────────────────
def should_skip(p: Path, excl: Set[str]) -> bool:
    return any(x in excl for x in p.parts)

def extract_imports(tree: ast.AST) -> List[str]:
    out = set()
    for n in ast.walk(tree):
        if isinstance(n, ast.Import):
            for a in n.names: out.add(a.name)
        elif isinstance(n, ast.ImportFrom):
            mod = n.module or ""
            for a in n.names: out.add(f"{mod}.{a.name}" if mod else a.name)
    return sorted(out)

def mod_to_path(root: Path, mod: str) -> Optional[Path]:
    rel = Path(*mod.split("."))
    for cand in [root / f"{rel}.py", root / rel / "__init__.py"]:
        if cand.exists(): return cand
    return None

def build_import_graph(root: Path, file_imports: Dict[str, List[str]], files: Set[str]) -> Dict[str, List[str]]:
    g = {f: set() for f in files}
    for f, imps in file_imports.items():
        for mod in imps:
            p = mod_to_path(root, mod.split(".")[0]) or mod_to_path(root, mod)
            if p and str(p) in files and str(p) != f:
                g[f].add(str(p))
    return {k: sorted(v) for k, v in g.items()}

# ─────────────────────────────────────────────────────────────
# Analysis (parallel-safe)
# ─────────────────────────────────────────────────────────────
def analyze_file(path: str, columns: List[str]) -> Dict[str, Any]:
    p = Path(path)
    src = p.read_text(errors="replace")
    src_hash = _sha1(src.encode(errors="replace"))
    tree = ast.parse(src)
    lines = src.splitlines()
    raw = _raw(src)

    metric_names = metrics_for_columns(columns)
    symbols = []
    for b in _safe([], lambda: rcc.cc_visit(src)):
        if not isinstance(b, (RF, RC)): continue
        node = next((n for n in ast.walk(tree)
                     if getattr(n, "lineno", None) == b.lineno and getattr(n, "name", None) == b.name), None)
        snippet = "\n".join(lines[b.lineno - 1:b.endline])
        data = {}
        for m in metric_names:
            if m in METRICS: data.update(METRICS[m][0](snippet, node))
        symbols.append({"name": b.name, "kind": "func" if isinstance(b, RF) else "class",
                        "file": str(p), "start": b.lineno, "end": b.endline, "metrics": data, "snippet": snippet})

    stats = {"file": str(p), "symbols": len(symbols), "loc": len(lines),
             "sloc": raw.sloc if raw else len(lines), "comments": raw.comments if raw else 0,
             "blank": raw.blank if raw else 0, "imports": extract_imports(tree)}
    return {"file": str(p), "hash": src_hash, "symbols": symbols, "stats": stats}

# ─────────────────────────────────────────────────────────────
# Output (rich)
# ─────────────────────────────────────────────────────────────
def get_all_columns() -> List[str]:
    """All possible column names."""
    base = ["file", "name", "kind", "lines"]
    metric_keys = []
    for _, _, keys in METRICS.values():
        metric_keys.extend(keys)
    return base + metric_keys + ["fan_in", "fan_out", "summary"]

def format_output(console: Console, symbols: List[Symbol], file_stats: List[FileStats],
                  columns: List[str], show_summary: bool):
    # Project summary
    total_loc = sum(f.loc for f in file_stats)
    total_sloc = sum(f.sloc for f in file_stats)
    console.print(Panel(
        f"[bold]Files:[/] {len(file_stats)}  [bold]Symbols:[/] {len(symbols)}  "
        f"[bold]LOC:[/] {total_loc:,} ({total_sloc:,} source)",
        title="Project", border_style="blue"))

    # Build file stats lookup
    fs_map = {f.file: f for f in file_stats}

    # Build table
    table = Table(show_header=True, header_style="bold cyan", box=None, pad_edge=False)
    for col in columns:
        table.add_column(col.upper(), overflow="fold")

    for s in symbols:
        fs = fs_map.get(s.file)
        row = []
        for col in columns:
            if col == "file": row.append(s.file)
            elif col == "name":
                sig = s.metrics.get("sig", "")
                row.append(f"{s.name}{sig}" if sig else s.name)
            elif col == "kind": row.append(s.kind)
            elif col == "lines": row.append(f"{s.start}-{s.end}")
            elif col == "fan_in": row.append(str(fs.fan_in if fs else 0))
            elif col == "fan_out": row.append(str(fs.fan_out if fs else 0))
            elif col == "summary": row.append(s.summary or "")
            elif col in s.metrics:
                v = s.metrics[col]
                row.append(f"{v:.2f}" if isinstance(v, float) else str(v))
            else: row.append("")
        table.add_row(*row)

    console.print(table)

def format_json(symbols: List[Symbol], file_stats: List[FileStats], import_graph: Dict):
    data = {
        "project": {"files": len(file_stats), "symbols": len(symbols),
                    "loc": sum(f.loc for f in file_stats), "sloc": sum(f.sloc for f in file_stats)},
        "files": [{"file": f.file, "loc": f.loc, "sloc": f.sloc, "symbols": f.symbols,
                   "fan_in": f.fan_in, "fan_out": f.fan_out} for f in file_stats],
        "symbols": [{"name": s.name, "kind": s.kind, "file": s.file, "start": s.start,
                     "end": s.end, "metrics": s.metrics, "summary": s.summary} for s in symbols],
        "import_graph": import_graph,
    }
    print(json.dumps(data, indent=2))

# ─────────────────────────────────────────────────────────────
# CLI
# ─────────────────────────────────────────────────────────────
DEFAULT_EXCL = {".venv", "__pycache__", ".git", ".tox", "node_modules", ".eggs", "build", "dist"}
DEFAULT_COLS = "name,lines,summary,h_vol,calls,h_bugs,cc,mi"

def main():
    examples = """
examples:
  python scout.py --ai --columns "name,lines,summary,h_vol,calls,h_bugs,cc,mi" scout.py     # default columns
"""
    ap = argparse.ArgumentParser(prog="scout", description="Static analysis for AI agents",
                                 epilog=examples, formatter_class=argparse.RawDescriptionHelpFormatter)
    ap.add_argument("path", nargs="?", default=".", help="Path to analyze")
    ap.add_argument("--columns", default=DEFAULT_COLS, help=f"Columns to display (default: {DEFAULT_COLS})")
    ap.add_argument("--list-columns", action="store_true", help="List all possible columns")
    ap.add_argument("--exclude-dirs", help="Extra dirs to exclude (comma-sep)")
    ap.add_argument("--ai", action="store_true", help="AI summaries per symbol")
    ap.add_argument("--json", action="store_true", help="JSON output")
    ap.add_argument("--mypy", action="store_true", help="Run mypy")
    ap.add_argument("--jobs", type=int, default=max(os.cpu_count() or 2, 2), help="Parallel workers")
    ap.add_argument("--cache", default=".scoutcache.json", help="Cache file")
    ap.add_argument("--no-cache", action="store_true", help="Disable cache")
    ap.add_argument("-q", "--quiet", action="store_true", help="Suppress progress")
    args = ap.parse_args()

    console = Console(stderr=True)

    if args.list_columns:
        console.print(", ".join(get_all_columns()))
        return

    path = Path(args.path)
    if not path.exists(): sys.exit(f"Error: {path} not found")

    columns = [c.strip() for c in args.columns.split(",") if c.strip()]
    excl = DEFAULT_EXCL | (set(args.exclude_dirs.split(",")) if args.exclude_dirs else set())

    cache = load_cache(Path(args.cache)) if not args.no_cache else {"v": CACHE_VER, "files": {}, "ai": {}}

    # Find files
    if path.is_file():
        files, root = [path], path.parent
    else:
        root = path
        files = [f for f in path.rglob("*.py") if not should_skip(f, excl)]

    if not files: sys.exit("No Python files found")
    files_s = [str(f) for f in files]
    files_set = set(files_s)

    # Check cache
    to_run, results = [], {}
    for f in files_s:
        p = Path(f)
        h = _sha1(p.read_bytes()) if p.exists() else ""
        cached = cache["files"].get(f)
        if cached and cached.get("hash") == h:
            results[f] = cached
        else:
            to_run.append(f)

    # Parallel analysis
    if to_run:
        if not args.quiet: console.print(f"Analyzing {len(to_run)} files...")
        with ProcessPoolExecutor(max_workers=args.jobs) as ex:
            futs = {ex.submit(analyze_file, f, columns): f for f in to_run}
            for fut in as_completed(futs):
                f = futs[fut]
                try:
                    res = fut.result()
                    results[f] = res
                    if not args.no_cache:
                        cache["files"][f] = {k: v for k, v in res.items() if k != "symbols" or True}
                        # strip snippets for cache
                        cache["files"][f]["symbols"] = [{k: v for k, v in s.items() if k != "snippet"}
                                                         for s in res.get("symbols", [])]
                except Exception as e:
                    if not args.quiet: console.print(f"[red]Error:[/] {f}: {e}")

    # Load AI if needed
    ai, ai_console, ai_cache = None, None, cache.get("ai", {})
    if args.ai:
        if not args.quiet: console.print("Loading AI model...")
        ai, ai_console = load_ai(args.quiet)

    # Build symbols + file stats
    all_symbols, all_stats, file_imports = [], [], {}
    for f in files_s:
        res = results.get(f)
        if not res: continue
        st = res.get("stats", {})
        fs = FileStats(file=st.get("file", f), symbols=st.get("symbols", 0), loc=st.get("loc", 0),
                       sloc=st.get("sloc", 0), comments=st.get("comments", 0),
                       blank=st.get("blank", 0), imports=st.get("imports", []))
        file_imports[fs.file] = fs.imports
        all_stats.append(fs)

        # Symbols (need snippets for AI)
        lines = Path(f).read_text(errors="replace").splitlines() if args.ai else None
        for s in res.get("symbols", []):
            summary = None
            if args.ai and ai and lines:
                snippet = "\n".join(lines[s["start"]-1:s["end"]])
                h = _sha1(snippet.encode())
                summary = ai_cache.get(h)
                if not summary:
                    if not args.quiet: console.print(f"  AI: {s['name']}...", end="\r")
                    summary = _safe(None, lambda: ai(snippet))
                    if summary: ai_cache[h] = summary
            all_symbols.append(Symbol(name=s["name"], kind=s["kind"], file=s["file"],
                                       start=s["start"], end=s["end"], metrics=s.get("metrics", {}),
                                       summary=summary))

    if args.ai: cache["ai"] = ai_cache

    # Import graph + fan-in/out
    import_graph = build_import_graph(root, file_imports, files_set)
    fan_in = {f: 0 for f in files_set}
    fan_out = {f: len(import_graph.get(f, [])) for f in files_set}
    for src, dsts in import_graph.items():
        for d in dsts: fan_in[d] = fan_in.get(d, 0) + 1
    fs_map = {f.file: f for f in all_stats}
    for f, fs in fs_map.items():
        fs.fan_in, fs.fan_out = fan_in.get(f, 0), fan_out.get(f, 0)

    if not args.no_cache: save_cache(Path(args.cache), cache)

    # Output
    out_console = Console()  # stdout
    if args.json:
        format_json(all_symbols, all_stats, import_graph)
    else:
        format_output(out_console, all_symbols, all_stats, columns, args.ai)

    # Mypy
    if args.mypy:
        r = run_mypy(path)
        if args.json:
            print(json.dumps({"mypy": r}, indent=2))
        else:
            style = "red" if r["errors"] > 0 else "green"
            out_console.print(f"\n[bold]Mypy:[/] [{style}]{r['errors']} errors[/], {r['warnings']} warnings")
            for issue in r.get("issues", [])[:10]:
                out_console.print(f"  {issue}")

if __name__ == "__main__":
    main()
Column(s)	Meaning	Source
`file`, `name`, `kind`, `lines`	Symbol identity + file + line span	Radon visitor metadata
`summary`	Short AI description per symbol	CodeT5 summarizer (`--ai`)
`fan_in`, `fan_out`	File dependency counts (in/out degree)	AST import parse → best-effort local import graph
`loc`	Total lines in snippet	`splitlines()`
`sloc`, `blank`, `comments`, `comment_ratio`	Raw code stats	`radon.raw.analyze` (+ ratio derived)
`cc`	Cyclomatic complexity	`radon.complexity.cc_visit`
`mi`	Maintainability index (0–100)	`radon.metrics.mi_visit`
`h_vol`, `h_diff`, `h_effort`, `h_time`, `h_bugs`	Halstead totals	`radon.metrics.h_visit(...).total.*`
`sig`, `params`	Function signature + param count	AST (`ast.unparse`, `node.args`)
`nesting`	Max nesting depth	AST walk (`if/for/while/with/try`)
`calls`, `branches`, `loops`	Call/branch/loop counts	AST walk (`Call` / `If` / `For+While`)
`has_doc`, `doc_lines`	Docstring presence + line count	AST (`ast.get_docstring`)
	#!/usr/bin/env python3
	"""scout.py — static analysis for AI agents (streamlined)"""

	from __future__ import annotations
	import argparse, ast, hashlib, json, os, subprocess, sys
	from concurrent.futures import ProcessPoolExecutor, as_completed
	from dataclasses import dataclass, field
	from pathlib import Path
	from typing import Any, Callable, Dict, List, Optional, Set

	import radon.complexity as rcc
	import radon.metrics as rm
	import radon.raw as rr
	from radon.visitors import Class as RC, Function as RF
	from rich.console import Console
	from rich.table import Table
	from rich.panel import Panel

	# ─────────────────────────────────────────────────────────────
	# Metrics (computed on-demand based on --columns)
	# ─────────────────────────────────────────────────────────────
	MetricFn = Callable[[str, Optional[ast.AST]], Dict[str, Any]]
	METRICS: Dict[str, tuple[MetricFn, str, List[str]]] = {} # name -> (fn, desc, output_keys)

	def metric(name: str, desc: str = ""):
	def wrap(fn: MetricFn):
	keys = list(fn("", None).keys())
	METRICS[name] = (fn, desc, keys)
	return fn
	return wrap

	def metrics_for_columns(columns: List[str]) -> List[str]:
	"""Return metric names needed to produce requested columns."""
	needed = set()
	for col in columns:
	for name, (_, _, keys) in METRICS.items():
	if col in keys:
	needed.add(name)
	break
	return list(needed)

	def _safe(default, fn):
	try: return fn()
	except Exception: return default

	def _raw(code: str):
	return _safe(None, lambda: rr.analyze(code))

	@metric("loc", "Lines of code")
	def m_loc(code: str, _): return {"loc": len(code.splitlines())}

	@metric("sloc", "Source lines")
	def m_sloc(code: str, _):
	a = _raw(code)
	return {"sloc": a.sloc if a else 0}

	@metric("comments", "Comment lines + ratio")
	def m_comments(code: str, _):
	a = _raw(code)
	if not a: return {"comments": 0, "comment_ratio": 0.0}
	return {"comments": a.comments, "comment_ratio": round(a.comments / max(a.sloc, 1), 2)}

	@metric("blank", "Blank lines")
	def m_blank(code: str, _):
	a = _raw(code)
	return {"blank": a.blank if a else 0}

	@metric("cc", "Cyclomatic complexity")
	def m_cc(code: str, _):
	blocks = _safe([], lambda: rcc.cc_visit(code))
	return {"cc": blocks[0].complexity if blocks else 0}

	@metric("mi", "Maintainability index (0-100)")
	def m_mi(code: str, _):
	return {"mi": round(float(_safe(0.0, lambda: rm.mi_visit(code, multi=True))), 2)}

	@metric("halstead", "Halstead metrics")
	def m_halstead(code: str, _):
	h = _safe(None, lambda: rm.h_visit(code))
	if not h: return {"h_vol": 0, "h_diff": 0, "h_effort": 0, "h_time": 0, "h_bugs": 0}
	t = h.total
	return {"h_vol": round(t.volume, 1), "h_diff": round(t.difficulty, 1),
	"h_effort": round(t.effort, 1), "h_time": round(t.time, 1), "h_bugs": round(t.bugs, 3)}

	@metric("sig", "Function signature")
	def m_sig(code: str, node):
	if not isinstance(node, (ast.FunctionDef, ast.AsyncFunctionDef)): return {"sig": ""}
	def u(x): return _safe("", lambda: ast.unparse(x))
	parts = []
	for a in (node.args.args or []):
	s = a.arg + (f": {u(a.annotation)}" if a.annotation else "")
	parts.append(s)
	defs = node.args.defaults or []
	if defs:
	base = len(parts) - len(defs)
	for i, d in enumerate(defs):
	if 0 <= base + i < len(parts): parts[base + i] += f"={u(d)}"
	ret = f" -> {u(node.returns)}" if node.returns else ""
	return {"sig": f"({', '.join(parts)}){ret}"}

	@metric("params", "Parameter count")
	def m_params(code: str, node):
	if not isinstance(node, (ast.FunctionDef, ast.AsyncFunctionDef)): return {"params": 0}
	return {"params": len(node.args.args or [])}

	@metric("nesting", "Max nesting depth")
	def m_nesting(code: str, node):
	if not node: return {"nesting": 0}
	inc = (ast.For, ast.While, ast.If, ast.With, ast.Try)
	def depth(n, lvl=0):
	if isinstance(n, inc): lvl += 1
	kids = [depth(c, lvl) for c in ast.iter_child_nodes(n)]
	return max([lvl] + kids) if kids else lvl
	return {"nesting": depth(node)}

	@metric("calls", "Function calls")
	def m_calls(code: str, node):
	return {"calls": sum(isinstance(n, ast.Call) for n in ast.walk(node)) if node else 0}

	@metric("branches", "if/elif branches")
	def m_branches(code: str, node):
	return {"branches": sum(isinstance(n, ast.If) for n in ast.walk(node)) if node else 0}

	@metric("loops", "for/while loops")
	def m_loops(code: str, node):
	return {"loops": sum(isinstance(n, (ast.For, ast.While)) for n in ast.walk(node)) if node else 0}

	@metric("doc", "Docstring info")
	def m_doc(code: str, node):
	if not node: return {"has_doc": 0, "doc_lines": 0}
	doc = _safe(None, lambda: ast.get_docstring(node))
	return {"has_doc": int(bool(doc)), "doc_lines": len(doc.splitlines()) if doc else 0}

	# ─────────────────────────────────────────────────────────────
	# Data model
	# ─────────────────────────────────────────────────────────────
	@dataclass
	class Symbol:
	name: str
	kind: str # function\|class
	file: str
	start: int
	end: int
	metrics: Dict[str, Any] = field(default_factory=dict)
	summary: Optional[str] = None

	@dataclass
	class FileStats:
	file: str
	symbols: int = 0
	loc: int = 0
	sloc: int = 0
	comments: int = 0
	blank: int = 0
	imports: List[str] = field(default_factory=list)
	fan_in: int = 0
	fan_out: int = 0

	# ─────────────────────────────────────────────────────────────
	# AI summarization (CodeT5 small - very fast)
	# ─────────────────────────────────────────────────────────────
	def load_ai(quiet=False):
	from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
	import torch

	model_id = "Salesforce/codet5-base-multi-sum" # ~220MB, optimized for summarization
	tok = AutoTokenizer.from_pretrained(model_id)
	model = AutoModelForSeq2SeqLM.from_pretrained(model_id)
	model.eval()
	console = Console(stderr=True) if not quiet else None

	def summarize(code: str) -> str:
	inputs = tok(code[:1024], return_tensors="pt", truncation=True, max_length=512)
	with torch.inference_mode():
	out = model.generate(**inputs, max_new_tokens=48)
	return tok.decode(out[0], skip_special_tokens=True).strip()
	return summarize, console

	# ─────────────────────────────────────────────────────────────
	# External tools (mypy only)
	# ─────────────────────────────────────────────────────────────
	def run_mypy(path: Path) -> Dict[str, Any]:
	try:
	r = subprocess.run(["mypy", str(path), "--no-error-summary"],
	capture_output=True, text=True, timeout=60)
	lines = (r.stdout or "").strip().splitlines()
	errors = [l for l in lines if ": error:" in l]
	warnings = [l for l in lines if ": warning:" in l]
	return {"errors": len(errors), "warnings": len(warnings), "issues": errors + warnings}
	except (subprocess.TimeoutExpired, FileNotFoundError):
	return {"errors": -1, "warnings": -1, "issues": []}

	# ─────────────────────────────────────────────────────────────
	# Cache
	# ─────────────────────────────────────────────────────────────
	CACHE_VER = 3

	def _sha1(b: bytes) -> str: return hashlib.sha1(b).hexdigest()

	def load_cache(p: Path) -> Dict:
	if not p.exists(): return {"v": CACHE_VER, "files": {}, "ai": {}}
	try:
	d = json.loads(p.read_text())
	return d if d.get("v") == CACHE_VER else {"v": CACHE_VER, "files": {}, "ai": {}}
	except: return {"v": CACHE_VER, "files": {}, "ai": {}}

	def save_cache(p: Path, d: Dict):
	try: p.write_text(json.dumps(d))
	except: pass

	# ─────────────────────────────────────────────────────────────
	# Imports + graphs
	# ─────────────────────────────────────────────────────────────
	def should_skip(p: Path, excl: Set[str]) -> bool:
	return any(x in excl for x in p.parts)

	def extract_imports(tree: ast.AST) -> List[str]:
	out = set()
	for n in ast.walk(tree):
	if isinstance(n, ast.Import):
	for a in n.names: out.add(a.name)
	elif isinstance(n, ast.ImportFrom):
	mod = n.module or ""
	for a in n.names: out.add(f"{mod}.{a.name}" if mod else a.name)
	return sorted(out)

	def mod_to_path(root: Path, mod: str) -> Optional[Path]:
	rel = Path(*mod.split("."))
	for cand in [root / f"{rel}.py", root / rel / "__init__.py"]:
	if cand.exists(): return cand
	return None

	def build_import_graph(root: Path, file_imports: Dict[str, List[str]], files: Set[str]) -> Dict[str, List[str]]:
	g = {f: set() for f in files}
	for f, imps in file_imports.items():
	for mod in imps:
	p = mod_to_path(root, mod.split(".")[0]) or mod_to_path(root, mod)
	if p and str(p) in files and str(p) != f:
	g[f].add(str(p))
	return {k: sorted(v) for k, v in g.items()}

	# ─────────────────────────────────────────────────────────────
	# Analysis (parallel-safe)
	# ─────────────────────────────────────────────────────────────
	def analyze_file(path: str, columns: List[str]) -> Dict[str, Any]:
	p = Path(path)
	src = p.read_text(errors="replace")
	src_hash = _sha1(src.encode(errors="replace"))
	tree = ast.parse(src)
	lines = src.splitlines()
	raw = _raw(src)

	metric_names = metrics_for_columns(columns)
	symbols = []
	for b in _safe([], lambda: rcc.cc_visit(src)):
	if not isinstance(b, (RF, RC)): continue
	node = next((n for n in ast.walk(tree)
	if getattr(n, "lineno", None) == b.lineno and getattr(n, "name", None) == b.name), None)
	snippet = "\n".join(lines[b.lineno - 1:b.endline])
	data = {}
	for m in metric_names:
	if m in METRICS: data.update(METRICS[m][0](snippet, node))
	symbols.append({"name": b.name, "kind": "func" if isinstance(b, RF) else "class",
	"file": str(p), "start": b.lineno, "end": b.endline, "metrics": data, "snippet": snippet})

	stats = {"file": str(p), "symbols": len(symbols), "loc": len(lines),
	"sloc": raw.sloc if raw else len(lines), "comments": raw.comments if raw else 0,
	"blank": raw.blank if raw else 0, "imports": extract_imports(tree)}
	return {"file": str(p), "hash": src_hash, "symbols": symbols, "stats": stats}

	# ─────────────────────────────────────────────────────────────
	# Output (rich)
	# ─────────────────────────────────────────────────────────────
	def get_all_columns() -> List[str]:
	"""All possible column names."""
	base = ["file", "name", "kind", "lines"]
	metric_keys = []
	for _, _, keys in METRICS.values():
	metric_keys.extend(keys)
	return base + metric_keys + ["fan_in", "fan_out", "summary"]

	def format_output(console: Console, symbols: List[Symbol], file_stats: List[FileStats],
	columns: List[str], show_summary: bool):
	# Project summary
	total_loc = sum(f.loc for f in file_stats)
	total_sloc = sum(f.sloc for f in file_stats)
	console.print(Panel(
	f"[bold]Files:[/] {len(file_stats)} [bold]Symbols:[/] {len(symbols)} "
	f"[bold]LOC:[/] {total_loc:,} ({total_sloc:,} source)",
	title="Project", border_style="blue"))

	# Build file stats lookup
	fs_map = {f.file: f for f in file_stats}

	# Build table
	table = Table(show_header=True, header_style="bold cyan", box=None, pad_edge=False)
	for col in columns:
	table.add_column(col.upper(), overflow="fold")

	for s in symbols:
	fs = fs_map.get(s.file)
	row = []
	for col in columns:
	if col == "file": row.append(s.file)
	elif col == "name":
	sig = s.metrics.get("sig", "")
	row.append(f"{s.name}{sig}" if sig else s.name)
	elif col == "kind": row.append(s.kind)
	elif col == "lines": row.append(f"{s.start}-{s.end}")
	elif col == "fan_in": row.append(str(fs.fan_in if fs else 0))
	elif col == "fan_out": row.append(str(fs.fan_out if fs else 0))
	elif col == "summary": row.append(s.summary or "")
	elif col in s.metrics:
	v = s.metrics[col]
	row.append(f"{v:.2f}" if isinstance(v, float) else str(v))
	else: row.append("")
	table.add_row(*row)

	console.print(table)

	def format_json(symbols: List[Symbol], file_stats: List[FileStats], import_graph: Dict):
	data = {
	"project": {"files": len(file_stats), "symbols": len(symbols),
	"loc": sum(f.loc for f in file_stats), "sloc": sum(f.sloc for f in file_stats)},
	"files": [{"file": f.file, "loc": f.loc, "sloc": f.sloc, "symbols": f.symbols,
	"fan_in": f.fan_in, "fan_out": f.fan_out} for f in file_stats],
	"symbols": [{"name": s.name, "kind": s.kind, "file": s.file, "start": s.start,
	"end": s.end, "metrics": s.metrics, "summary": s.summary} for s in symbols],
	"import_graph": import_graph,
	}
	print(json.dumps(data, indent=2))

	# ─────────────────────────────────────────────────────────────
	# CLI
	# ─────────────────────────────────────────────────────────────
	DEFAULT_EXCL = {".venv", "__pycache__", ".git", ".tox", "node_modules", ".eggs", "build", "dist"}
	DEFAULT_COLS = "name,lines,summary,h_vol,calls,h_bugs,cc,mi"

	def main():
	examples = """
	examples:
	python scout.py --ai --columns "name,lines,summary,h_vol,calls,h_bugs,cc,mi" scout.py # default columns
	"""
	ap = argparse.ArgumentParser(prog="scout", description="Static analysis for AI agents",
	epilog=examples, formatter_class=argparse.RawDescriptionHelpFormatter)
	ap.add_argument("path", nargs="?", default=".", help="Path to analyze")
	ap.add_argument("--columns", default=DEFAULT_COLS, help=f"Columns to display (default: {DEFAULT_COLS})")
	ap.add_argument("--list-columns", action="store_true", help="List all possible columns")
	ap.add_argument("--exclude-dirs", help="Extra dirs to exclude (comma-sep)")
	ap.add_argument("--ai", action="store_true", help="AI summaries per symbol")
	ap.add_argument("--json", action="store_true", help="JSON output")
	ap.add_argument("--mypy", action="store_true", help="Run mypy")
	ap.add_argument("--jobs", type=int, default=max(os.cpu_count() or 2, 2), help="Parallel workers")
	ap.add_argument("--cache", default=".scoutcache.json", help="Cache file")
	ap.add_argument("--no-cache", action="store_true", help="Disable cache")
	ap.add_argument("-q", "--quiet", action="store_true", help="Suppress progress")
	args = ap.parse_args()

	console = Console(stderr=True)

	if args.list_columns:
	console.print(", ".join(get_all_columns()))
	return

	path = Path(args.path)
	if not path.exists(): sys.exit(f"Error: {path} not found")

	columns = [c.strip() for c in args.columns.split(",") if c.strip()]
	excl = DEFAULT_EXCL \| (set(args.exclude_dirs.split(",")) if args.exclude_dirs else set())

	cache = load_cache(Path(args.cache)) if not args.no_cache else {"v": CACHE_VER, "files": {}, "ai": {}}

	# Find files
	if path.is_file():
	files, root = [path], path.parent
	else:
	root = path
	files = [f for f in path.rglob("*.py") if not should_skip(f, excl)]

	if not files: sys.exit("No Python files found")
	files_s = [str(f) for f in files]
	files_set = set(files_s)

	# Check cache
	to_run, results = [], {}
	for f in files_s:
	p = Path(f)
	h = _sha1(p.read_bytes()) if p.exists() else ""
	cached = cache["files"].get(f)
	if cached and cached.get("hash") == h:
	results[f] = cached
	else:
	to_run.append(f)

	# Parallel analysis
	if to_run:
	if not args.quiet: console.print(f"Analyzing {len(to_run)} files...")
	with ProcessPoolExecutor(max_workers=args.jobs) as ex:
	futs = {ex.submit(analyze_file, f, columns): f for f in to_run}
	for fut in as_completed(futs):
	f = futs[fut]
	try:
	res = fut.result()
	results[f] = res
	if not args.no_cache:
	cache["files"][f] = {k: v for k, v in res.items() if k != "symbols" or True}
	# strip snippets for cache
	cache["files"][f]["symbols"] = [{k: v for k, v in s.items() if k != "snippet"}
	for s in res.get("symbols", [])]
	except Exception as e:
	if not args.quiet: console.print(f"[red]Error:[/] {f}: {e}")

	# Load AI if needed
	ai, ai_console, ai_cache = None, None, cache.get("ai", {})
	if args.ai:
	if not args.quiet: console.print("Loading AI model...")
	ai, ai_console = load_ai(args.quiet)

	# Build symbols + file stats
	all_symbols, all_stats, file_imports = [], [], {}
	for f in files_s:
	res = results.get(f)
	if not res: continue
	st = res.get("stats", {})
	fs = FileStats(file=st.get("file", f), symbols=st.get("symbols", 0), loc=st.get("loc", 0),
	sloc=st.get("sloc", 0), comments=st.get("comments", 0),
	blank=st.get("blank", 0), imports=st.get("imports", []))
	file_imports[fs.file] = fs.imports
	all_stats.append(fs)

	# Symbols (need snippets for AI)
	lines = Path(f).read_text(errors="replace").splitlines() if args.ai else None
	for s in res.get("symbols", []):
	summary = None
	if args.ai and ai and lines:
	snippet = "\n".join(lines[s["start"]-1:s["end"]])
	h = _sha1(snippet.encode())
	summary = ai_cache.get(h)
	if not summary:
	if not args.quiet: console.print(f" AI: {s['name']}...", end="\r")
	summary = _safe(None, lambda: ai(snippet))
	if summary: ai_cache[h] = summary
	all_symbols.append(Symbol(name=s["name"], kind=s["kind"], file=s["file"],
	start=s["start"], end=s["end"], metrics=s.get("metrics", {}),
	summary=summary))

	if args.ai: cache["ai"] = ai_cache

	# Import graph + fan-in/out
	import_graph = build_import_graph(root, file_imports, files_set)
	fan_in = {f: 0 for f in files_set}
	fan_out = {f: len(import_graph.get(f, [])) for f in files_set}
	for src, dsts in import_graph.items():
	for d in dsts: fan_in[d] = fan_in.get(d, 0) + 1
	fs_map = {f.file: f for f in all_stats}
	for f, fs in fs_map.items():
	fs.fan_in, fs.fan_out = fan_in.get(f, 0), fan_out.get(f, 0)

	if not args.no_cache: save_cache(Path(args.cache), cache)

	# Output
	out_console = Console() # stdout
	if args.json:
	format_json(all_symbols, all_stats, import_graph)
	else:
	format_output(out_console, all_symbols, all_stats, columns, args.ai)

	# Mypy
	if args.mypy:
	r = run_mypy(path)
	if args.json:
	print(json.dumps({"mypy": r}, indent=2))
	else:
	style = "red" if r["errors"] > 0 else "green"
	out_console.print(f"\n[bold]Mypy:[/] [{style}]{r['errors']} errors[/], {r['warnings']} warnings")
	for issue in r.get("issues", [])[:10]:
	out_console.print(f" {issue}")

	if __name__ == "__main__":
	main()