Describes how the agent executes and revises actions at runtime.
-
Single-Pass — Plan once, execute without revision. Example: “Draft the email in one shot and send.”
-
Iterative Refinement — Loop to improve using feedback until a stop criterion. Example: “Revise the summary until it’s ≤ 150 words.”
-
Backtracking — Revert to a prior state and try an alternative when progress fails. Example: “If parsing fails, roll back to the last checkpoint and switch parser.”
-
Parallel Exploration — Pursue multiple options concurrently; select or merge the best. Example: “Run three search strategies, then pick the top result.”
-
Continuous Control — Continuously adjust actions from streaming feedback. Example: “Tune request rate live to keep success ≥ 95%.”
Describes high-level strategy choices and commitments in control.
-
Exploration ↔ Exploitation — Balance trying new options vs. using known best. Example: “Every 10th task, sample a new tool; otherwise use the proven one.”
-
Temporal Horizon — Immediate, short-term, or long-term planning scope. Example: “Plan the next 5 steps toward the goal before executing.”
-
Adaptivity — Degree of change based on outcomes: none → tweaks → full re-plan. Example: “If validation fails twice, re-plan from scratch.”
-
Commitment Level — How reversible decisions are: tentative → staged commit → hard lock-in. Example: “Reserve seats, but finalize purchase only after fraud checks pass.”
-
Strategy Scope — Where the strategy applies: global | domain-bounded | condition-bounded | instance-bounded. Example: “For data migrations (domain-bounded), enable backtracking checkpoints.”
Describes how the control logic is organized and implemented.
-
Reactive (Reflex) — Direct stimulus-response without planning. Example: “
if timeout: retry()immediately.” -
Deliberative (Planner) — Uses an internal model to plan sequences toward goals. Example: “Generate a route plan before the first API call.”
-
Utility-Based — Chooses actions that maximize a scored objective. Example: “Pick the plan with the best accuracy–cost score.”
-
Learning (Adaptive) — Improves its policy from feedback or reward. Example: “Update tool selection based on success rates.”
-
Hybrid (Layered) — Combines reflexes with planning/optimization. Example: “Emergency-brake reflex; otherwise follow the planned route.”