lethain/Reasoning errors in arch decisions.md

## Reasoning errors in arch decisions.md

      
    Raw
  

              Reasoning errors in arch decisions.md
            
          
#
Where
Why it’s a problem
Suggested fix


R-1
Federated Decision Authority → Team-Level vs. Cross-Team
“Product teams have full authority over architecture decisions that affect only their services.”  Five lines later you require that any cross-team dependency triggers AAG involvement. In micro-service environments, almost every change creates at least incidental integration risk (shared observability, IAM scope, cost envelopes, etc.).  You’ve implied a bright line that rarely exists.
Spell out minimal-impact criteria (e.g., “interface change limited to additive, backward-compatible API paths”; “no net-new infra cost”), or treat “default AAG light-review” as the safe path.


R-2
Mandatory ADRs → “Teams cannot proceed … until ADR is published and reviewed.”
This turns ADRs from lightweight records into a gating approval workflow (contradicting your later “advisory” language).  It also conflicts with the emergency-decision escape hatch (§Escalation).
Decide which you really want: (a) documentation first but non-blocking, or (b) approvals first.  State it unambiguously and align the escalation language.


R-3
AAG capacity assumptions
2-3 h/wk × ~6 people ≈ a single engineer’s time.  Yet the group must review cross-team and org-level decisions, run office hours, maintain metrics, and retros.  Volume and latency targets (90 % ADR compliance) cannot be delivered with that capacity.
Either increase staffing (or funded rotation), narrow the scope (e.g., only review non-trivial decisions), or automate triage so the AAG focuses on <20 % of ADRs.


R-4
Success criteria → “< 5 % decision escalation rate.”
A low escalation rate is not inherently healthy – it can mean decisions are rubber-stamped or that teams self-censor controversial work.
Track not just rate but outcome quality (e.g., rework, post-integration defects).  Consider a target band (“10–20 % escalations with <N days cycle-time”) instead of “as low as possible”.


R-5
Measurement & Review → leading vs. lagging
“Architecture Office Hours attendance” is treated as a leading indicator of success, but high attendance can also indicate confusion or friction.
Pair it with sentiment or resolution metrics (e.g., repeat attendance on the same topic, average wait for answers) to avoid the “vanity metric” trap.


R-6
Root-cause analysis vs. chosen remedies
You diagnose “decision authority ambiguity” and “technical-standard inconsistency”, but most remedies are process or governance layers – not how authority gets clarified in everyday code/PR/release loops.
Show a direct trace: root cause → policy knob → expected behavioral change.  If the knob is “publish & auto-test shared libraries”, bake that into policy alongside ADR governance.
#	Where	Why it’s a problem	Suggested fix
R-1	Federated Decision Authority → Team-Level vs. Cross-Team	“Product teams have full authority over architecture decisions that affect only their services.” Five lines later you require that any cross-team dependency triggers AAG involvement. In micro-service environments, almost every change creates at least incidental integration risk (shared observability, IAM scope, cost envelopes, etc.). You’ve implied a bright line that rarely exists.	Spell out minimal-impact criteria (e.g., “interface change limited to additive, backward-compatible API paths”; “no net-new infra cost”), or treat “default AAG light-review” as the safe path.
R-2	Mandatory ADRs → “Teams cannot proceed … until ADR is published and reviewed.”	This turns ADRs from lightweight records into a gating approval workflow (contradicting your later “advisory” language). It also conflicts with the emergency-decision escape hatch (§Escalation).	Decide which you really want: (a) documentation first but non-blocking, or (b) approvals first. State it unambiguously and align the escalation language.
R-3	AAG capacity assumptions	2-3 h/wk × ~6 people ≈ a single engineer’s time. Yet the group must review cross-team and org-level decisions, run office hours, maintain metrics, and retros. Volume and latency targets (90 % ADR compliance) cannot be delivered with that capacity.	Either increase staffing (or funded rotation), narrow the scope (e.g., only review non-trivial decisions), or automate triage so the AAG focuses on <20 % of ADRs.
R-4	Success criteria → “< 5 % decision escalation rate.”	A low escalation rate is not inherently healthy – it can mean decisions are rubber-stamped or that teams self-censor controversial work.	Track not just rate but outcome quality (e.g., rework, post-integration defects). Consider a target band (“10–20 % escalations with <N days cycle-time”) instead of “as low as possible”.
R-5	Measurement & Review → leading vs. lagging	“Architecture Office Hours attendance” is treated as a leading indicator of success, but high attendance can also indicate confusion or friction.	Pair it with sentiment or resolution metrics (e.g., repeat attendance on the same topic, average wait for answers) to avoid the “vanity metric” trap.
R-6	Root-cause analysis vs. chosen remedies	You diagnose “decision authority ambiguity” and “technical-standard inconsistency”, but most remedies are process or governance layers – not how authority gets clarified in everyday code/PR/release loops.	Show a direct trace: root cause → policy knob → expected behavioral change. If the knob is “publish & auto-test shared libraries”, bake that into policy alongside ADR governance.
No results found