swalke16/composable_agent_framework.plan.md

## composable_agent_framework.plan.md

      
    Raw
  

              composable_agent_framework.plan.md
            
          
  name
  overview
  todos
  isProject
  
  
  Composable Agent Framework
  Design a composable chat-based agent framework using RubyLLM, with ActionCable streaming, that can be surfaced across multiple product entry points while maintaining traceability.
  
  
  id
  content
  status
  
  
  rubyllm-setup
  Add RubyLLM gem and configure OpenAI/Anthropic providers
  pending
  
  
  id
  content
  status
  
  
  actioncable-channel
  Create BizyChannel for streaming responses to users
  pending
  
  
  id
  content
  status
  
  
  agent-base
  Create Bizy::Agents::Base class extending RubyLLM::Tool
  pending
  
  
  id
  content
  status
  
  
  bizy-chat
  Build Bizy::Chat as top-level agent with core tools and sub-agents
  pending
  
  
  id
  content
  status
  
  
  tool-migration
  Convert existing MCP tools to RubyLLM tool classes
  pending
  
  
  id
  content
  status
  
  
  meeting-agent
  Create MeetingAgent for 1:1 meeting questions
  pending
  
  
  id
  content
  status
  
  
  analytics-agent
  Create AnalyticsAgent for admin analytics questions
  pending
  
  
  id
  content
  status
  
  
  frontend-streaming
  Update Kaleidoscope to consume ActionCable stream
  pending
  
  
  false
  
  
Composable Agent Framework for Bonusly

1. Overview

What We're Building

A chat-based AI assistant framework that:

Works across multiple product surfaces (meetings, admin, general chat)
Routes questions to specialized "agents" based on context and permissions
Streams responses in real-time without blocking web server threads
Is built on proven patterns using the RubyLLM library

Goals


Goal
Why It Matters


Multi-surface
One framework powers Bizy everywhere it appears


Composable
Add new capabilities by adding agents, not modifying core code


Streaming
Users see responses as they're generated


Non-blocking
Long LLM calls don't consume web server threads


Provider-agnostic
Use the best model for each task (Claude for analysis, GPT for conversation)


What Changes from Current Bizy


Aspect
Current
New


LLM Client
OpenAI Ruby SDK
RubyLLM (any provider)


Tool Execution
MCP server (HTTP calls)
Direct function calling


Response Delivery
Synchronous JSON
ActionCable streaming


Thread Model
Blocks Puma thread
Async background jobs


Extensibility
Modify AiDriver
Add new agent class


2. Core Concepts

What is a "Tool"?

A tool is a function the AI can call to get information or perform actions. When you ask "Who is my manager?", the AI calls a tool to look that up.
class Bizy::Tools::GetOrgContext < RubyLLM::Tool
  description "Get organizational context for a user"

  param :target_user, desc: "User ID, email, or name"

  def execute(target_user:)
    # Called by the AI when it needs org info
    target = resolve_user(target_user)
    {
      user: format_user(target),
      manager: format_manager(target),
      direct_reports: format_direct_reports(target)
    }
  end
end
The AI sees the description and param info, decides when to use the tool, and receives the return value.
What is an "Agent"?

An agent is a specialized AI assistant with its own tools and personality. Think of it as a department expert you can delegate questions to.
In our framework, agents ARE tools. The main Bizy chat can invoke a specialized agent just like any other tool:
class Bizy::Agents::Meeting < Bizy::Agents::Base
  description "Ask questions about 1:1 meetings and past discussions"

  def tools
    [GetMeetingHistory.new(...), GetMeetingSummary.new(...)]
  end

  def instructions
    "You are a meeting assistant. Help with 1:1 prep and history."
  end
end
The Agent-as-Tool Pattern

Instead of one chat with many tools, we have a coordinator chat with a few core tools plus specialized agents:
┌─────────────────────────────────────────────────────────────┐
│ Main Bizy Chat                                               │
│   Core tools: GetOrgContext, GetRecognition, GetMilestones   │
│   Agents:     MeetingAgent, AnalyticsAgent                   │
│                                                              │
│   User: "What did Jane and I discuss last week?"             │
│         ↓                                                    │
│   Bizy decides: This is a meeting question → invoke agent    │
│         ↓                                                    │
│   ┌─────────────────────────────────────────┐                │
│   │ MeetingAgent                             │                │
│   │   Tools: GetMeetingHistory, GetSummary   │                │
│   │   Runs its own AI chat                   │                │
│   │   Returns: "In your Jan 15 meeting..."   │                │
│   └─────────────────────────────────────────┘                │
└─────────────────────────────────────────────────────────────┘

Why this is better:

The main chat doesn't need to know about every tool in the system
Each agent encapsulates its domain completely
Adding a new capability = adding a new agent class
Agents can be nested (an executive report agent could use the analytics agent)

Agent Orchestration Patterns Compared

Before choosing an architecture, it's worth understanding the landscape of agent orchestration patterns:


Pattern
What It Is
Best For


Function Calling
Tools defined in API request, app executes them
Simple integrations, 2-5 tools


MCP
Tools on separate server, discovered/executed via JSON-RPC
Shared tooling, security isolation


A2A
Open standard for inter-agent communication across organizations
Enterprise ecosystems, multi-vendor


ACP
Agent Communication Protocol (now merged into A2A)
Deprecated - use A2A


Agent-as-Tool
Specialized agents invoked as tools by a coordinator
Multi-agent orchestration within one system


Function Calling (Direct)

User → App → LLM API (with tool definitions) → App executes tool → LLM API → Response


Pros: Simple, full control, no network hops for tool discovery
Cons: Provider-specific APIs when using raw SDKs (mitigated by RubyLLM)
Used by: RubyLLM's native tool pattern


Note: RubyLLM eliminates provider lock-in by providing a unified API. Tools written once work across OpenAI, Anthropic, Gemini, Mistral, and others.

MCP (Model Context Protocol)

User → App → LLM API → MCP Server (JSON-RPC) → Tool execution → LLM API → Response


Pros: Security isolation, shared tooling, credential separation
Cons: Extra network hop, more infrastructure
Used by: Current Bizy and BonuslyGPT implementations

A2A (Agent-to-Agent Protocol)

Agent A → HTTP/JSON-RPC → Agent B (with Agent Card discovery)


Pros: Cross-organization interop, standardized discovery
Cons: Overkill for internal agents, added complexity
Best for: Enterprise agent marketplaces, not internal frameworks

Agent-as-Tool Pattern

Coordinator Agent → Specialized Agent (via tool call) → Result → Coordinator continues


Pros: Composable, each agent focused, natural tool semantics
Cons: Deeper call stacks, potential for loops
Used by: RubyLLM's multi-agent patterns

Why We Chose Agent-as-Tool

Agent-as-Tool with direct function calling (via RubyLLM) is the right fit because:

All agents are internal to Bonusly - no need for A2A's cross-org interop
RubyLLM handles tool execution natively without MCP's network overhead
Specialized agents can be tools that the coordinator invokes
Maintains simplicity while enabling composition


3. Current State

Existing AI Systems at Bonusly


System
Purpose
How It Works


Bizy
User assistant for meetings/org
MCP tools via OpenAI Responses API


KYB
Compliance verification
Direct API calls, structured output


BonuslyGPT
Internal support
MCP tools + vector search


Current Bizy Architecture

User → ChatController → AiDriver → OpenAI API → MCP Server → Tools
                            ↓
                       Response (sync)

Issues with current approach:

Blocks Puma threads: Each chat request holds a thread for 10-60 seconds
MCP overhead: Tool calls require HTTP round-trips to MCP server
Single agent: All logic in AiDriver, hard to extend
OpenAI lock-in: Tied to OpenAI's specific API

Key Files in Current Implementation


lib/bizy/ai_driver.rb - Main orchestrator (538 lines)
lib/bizy/base_tool.rb - Tool base class for MCP
lib/bizy/tools/*.rb - Individual tools (7 tools)
app/controllers/mcp/bizy_controller.rb - MCP server endpoint


4. Architecture

High-Level Flow

┌─────────────────────────────────────────────────────────────┐
│ New Architecture                                             │
├─────────────────────────────────────────────────────────────┤
│                                                              │
│  Frontend (Kaleidoscope)                                     │
│      │                                                       │
│      │ 1. POST /api/v2/bizy/chat                             │
│      │ 2. Subscribe to BizyChannel                           │
│      ▼                                                       │
│  ChatController                                              │
│      │                                                       │
│      │ 3. Enqueue Bizy::ChatJob                              │
│      │    (Puma thread freed immediately)                    │
│      ▼                                                       │
│  Bizy::ChatJob (Async::Job)                                  │
│      │                                                       │
│      │ 4. Bizy::Chat.execute(user:, message:, metadata:)      │
│      ▼                                                       │
│  RubyLLM Chat with Tools + Agents                            │
│      │                                                       │
│      │ 5. Stream tokens via ActionCable                      │
│      ▼                                                       │
│  BizyChannel → Frontend                                      │
│                                                              │
└─────────────────────────────────────────────────────────────┘

Why RubyLLM vs Building Our Own

Before adopting RubyLLM, we should consider whether to continue building on our current custom implementation or adopt a library.
Current Bizy Implementation:

Custom OpenAI Responses API integration
Custom MCP implementation for tool calling
Custom streaming (SSE-based, has Puma threading issues)
Custom conversation management

Comparison:


Aspect
Build Our Own
Use RubyLLM


Development time
High - rebuild provider APIs, tool calling, streaming
Low - already built and tested


Provider support
Must implement each (OpenAI, Anthropic, etc.)
15+ providers out of the box


Tool calling
Must handle JSON schema, validation, execution
Native DSL, automatic schema generation


Streaming
Must handle SSE/WebSocket per provider
Consistent block-based API


Maintenance
Team must track API changes for each provider
Community maintains compatibility


Testing
Must build test infrastructure
Built-in test helpers and mocking


Extended thinking
Must implement per-provider
Unified API across Claude/Gemini


Async/concurrency
Must build fiber/thread management
Built-in Async::Job integration


Flexibility
Full control over implementation
Constrained by library design


Lock-in risk
None
Tied to RubyLLM's abstraction


Pros of RubyLLM:

Proven: Active community, battle-tested in production
Speed: Skip months of infrastructure work, focus on Bonusly-specific features
Multi-model: Use Claude for complex reasoning, GPT for quick responses, Gemini for long context
Future-proof: New models (GPT-5, Claude 4) supported by updating the gem
Tool ecosystem: Can integrate with MCP servers via ruby_llm-mcp
Well-documented: Extensive documentation with guides for common agentic patterns like multi-agent orchestration, tool composition, and streaming

Cons of RubyLLM:

Abstraction leaks: Edge cases may require workarounds or PRs
Dependency risk: Library could become unmaintained (mitigated: active development, MIT license)
Learning curve: Team needs to learn RubyLLM patterns
ActiveRecord assumptions: Some features (like acts_as_chat) assume ActiveRecord; we use Mongoid

Recommendation: Adopt RubyLLM. The development time savings are significant, and the library's design aligns well with our needs. The Mongoid limitation for chat persistence is easily worked around with our existing Bizy::ChatHistory model (see "Chat History Persistence" section).
RubyLLM Foundation

RubyLLM provides:


Feature
Benefit


Provider Agnostic
Same API for OpenAI, Anthropic, Gemini, etc. Use the best model for each task.


Native Tools
First-class tool support without MCP overhead


Streaming
Built-in streaming with block syntax


Rails Integration
acts_as_chat for persistence (ActiveRecord)


Async Support
Fiber-based concurrency for parallel operations


Tool Definition

RubyLLM tools use a DSL for defining input parameters. RubyLLM v1.9+ provides a params block DSL for complex schemas (nested objects, arrays, enums), while simpler tools can use the param helper.
Simple tool with param helper:
# app/lib/bizy/tools/get_org_context.rb
class Bizy::Tools::GetOrgContext < RubyLLM::Tool
  description "Get organizational context for a user"

  param :target_user, desc: "User ID, email, or name"

  def initialize(requesting_user:)
    @requesting_user = requesting_user
  end

  def execute(target_user:)
    target = resolve_user(target_user)

    {
      user: format_user(target),
      manager: format_manager(target),
      direct_reports: format_direct_reports(target),
      relationship_to_you: build_relationship(target)
    }
  end

  private

  def resolve_user(identifier)
    return @requesting_user if identifier.blank?
    @requesting_user.company.users.active.find_by_identifier(identifier)
  end

  def format_user(user)
    { id: user.id.to_s, name: user.display_name, email: user.email }
  end
end
Complex tool with params DSL (v1.9+):
For tools with structured inputs, use the params block for nested objects, arrays, and enums:
class Bizy::Tools::GetParticipationTrends < RubyLLM::Tool
  description "Get participation trends for the company"

  params do
    string :group_by, description: "Property to group by (department, location, team)"
    integer :months, description: "Number of months to analyze", required: false
    object :filters, description: "Optional filters to apply", required: false do
      array :departments, of: :string, description: "Limit to specific departments"
      enum :status, %w[active inactive all], description: "User status filter"
    end
  end

  def initialize(company:)
    @company = company
  end

  def execute(group_by:, months: 12, filters: nil)
    input = Analytics::Queries::GetGivingAndReceivingParticipationData::Input.new(
      company_id: @company.id,
      end_time: Time.current,
      custom_property_group: group_by
    )

    result = Analytics::Queries::GetGivingAndReceivingParticipationData.call(input)
    format_for_llm(result)
  end
end
Structured Output with Schemas

RubyLLM supports structured output via with_schema, ensuring the LLM returns valid JSON matching a defined schema. This is useful when agents need to return structured data rather than free-form text.
Defining output schemas with RubyLLM::Schema:
# app/lib/bizy/schemas/meeting_summary.rb
class Bizy::Schemas::MeetingSummary < RubyLLM::Schema
  string :summary, description: "Brief summary of the meeting"
  array :key_topics, of: :string, description: "Main topics discussed"
  array :action_items, description: "Action items from the meeting" do
    string :task, description: "The task to complete"
    string :owner, description: "Person responsible"
    string :due_date, description: "When it's due", required: false
  end
  string :next_steps, description: "Recommended next steps", required: false
end
Using schemas in agent responses:
class Bizy::Agents::Meeting < Bizy::Agents::Base
  def execute(question:)
    RubyLLM.chat(model: "claude-sonnet-4")
      .with_tools(*tools)
      .with_instructions(instructions)
      .with_schema(Bizy::Schemas::MeetingSummary)
      .ask(question)
  end
end
When to use schemas:
Use schemas whenever the caller needs well-structured output - the LLM will conform its response to the schema regardless of how the question is phrased.

Analytics agents returning data for charts/tables
Meeting agents returning summaries with action items
Any agent whose output will be parsed or displayed programmatically
Ensuring consistent response formats for frontend rendering

Agent Base Class

# app/lib/bizy/agents/base.rb
class Bizy::Agents::Base < RubyLLM::Tool
  # Agents are tools that run their own sub-chat

  param :question, desc: "The question to ask this specialized agent"

  def initialize(user:, metadata: {})
    @user = user
    @metadata = metadata
  end

  def execute(question:)
    chat = RubyLLM.chat(model: "gpt-4o")
      .with_tools(*tools)
      .with_instructions(instructions)

    chat.ask(question).content
  end

  private

  def tools
    raise NotImplementedError
  end

  def instructions
    raise NotImplementedError
  end
end
Example Agent: MeetingAgent

# app/lib/bizy/agents/meeting.rb
class Bizy::Agents::Meeting < Bizy::Agents::Base
  description "Ask questions about 1:1 meetings and past discussions"

  def self.available?(user:, metadata:)
    metadata[:meeting_partner_id].present?
  end

  def initialize(user:, metadata:)
    super
    @partner_id = metadata[:meeting_partner_id]
    @partner = User.find(@partner_id)
  end

  private

  def tools
    [
      Bizy::Tools::GetMeetingHistory.new(
        requesting_user: @user,
        partner_id: @partner_id
      ),
      Bizy::Tools::GetMeetingSummary.new(
        requesting_user: @user,
        partner_id: @partner_id
      ),
    ]
  end

  def instructions
    <<~PROMPT
      You are a meeting assistant for 1:1 meetings.

      Context:
      - Current user: #{@user.display_name}
      - Meeting partner: #{@partner.display_name}

      Use get_meeting_summary for recent meetings.
      Use get_meeting_history for full history.
    PROMPT
  end
end
Bizy::Chat - The Top-Level Agent

Bizy::Chat is the main entry point - a top-level agent that orchestrates tools and sub-agents. Unlike sub-agents (which extend RubyLLM::Tool), Bizy::Chat is invoked by application code, not by an LLM.
# app/lib/bizy/chat.rb
class Bizy::Chat
  INSTRUCTIONS = <<~PROMPT
    You are Bizy, a helpful AI assistant for Bonusly.

    You have access to tools for looking up information and specialized agents
    for specific domains. When a question matches a specialized agent's domain,
    prefer delegating to that agent rather than answering directly.

    Be helpful, concise, and accurate. If you don't have enough information
    to answer a question, say so.
  PROMPT

  # Core tools (always available)
  CORE_TOOLS = [
    Bizy::Tools::GetOrgContext,
    Bizy::Tools::GetRecognition,
    Bizy::Tools::GetMilestones,
  ].freeze

  # Sub-agents (conditionally available based on context)
  SUB_AGENTS = [
    Bizy::Agents::Meeting,
    Bizy::Agents::Analytics,
  ].freeze

  def self.execute(user:, message:, metadata: {}, conversation_id: nil, &block)
    new(user: user, metadata: metadata, conversation_id: conversation_id)
      .execute(message, &block)
  end

  def initialize(user:, metadata: {}, conversation_id: nil)
    @user = user
    @metadata = metadata
    @conversation_id = conversation_id
  end

  def execute(message, &block)
    start_time = Time.current

    response = if block_given?
      chat.ask(message, &block)
    else
      chat.ask(message)
    end

    persist_to_history(message, response, start_time)
    response
  end

  private

  def chat
    @chat ||= build_chat
  end

  def build_chat
    tools = CORE_TOOLS.map { |klass| klass.new(requesting_user: @user) }

    SUB_AGENTS.each do |agent_class|
      if agent_class.available?(user: @user, metadata: @metadata)
        tools << agent_class.new(user: @user, metadata: @metadata)
      end
    end

    llm_chat = RubyLLM.chat(model: "claude-sonnet-4")
      .with_tools(*tools)
      .with_instructions(build_instructions)
      .with_thinking(effort: :medium)

    restore_conversation_context(llm_chat) if @conversation_id.present?
    llm_chat
  end

  def build_instructions
    "#{INSTRUCTIONS}\n\nCurrent user: #{@user.display_name}"
  end

  def restore_conversation_context(llm_chat)
    Bizy::ChatHistory
      .where(conversation_id: @conversation_id)
      .order(created_at: :asc)
      .each do |entry|
        llm_chat.add_message(role: :user, content: entry.user_message)
        llm_chat.add_message(role: :assistant, content: entry.bizy_response)
      end
  end

  def persist_to_history(message, response, start_time)
    Bizy::ChatHistory.create!(
      user_id: @user.id,
      company_id: @user.company_id,
      conversation_id: @conversation_id,
      context_type: @metadata[:context_type],
      context_id: @metadata[:context_id],
      user_message: message,
      bizy_response: response.content,
      response_time_ms: ((Time.current - start_time) * 1000).to_i,
      model_id: response.model_id,
      provider: response.provider_id,
      input_tokens: response.input_tokens,
      output_tokens: response.output_tokens,
      thinking_text: response.thinking&.text,
      tool_calls: format_tool_calls(response.tool_calls)
    )
  end

  def format_tool_calls(tool_calls)
    return [] if tool_calls.blank?
    tool_calls.map { |tc| { name: tc.name, arguments: tc.arguments, result: tc.result } }
  end
end
Key points:

execute is the single entry point - takes user, message, metadata, returns response
Instructions are minimal - tool descriptions handle routing
Conversation context is restored for multi-turn conversations
History is persisted after each response
Streaming is supported via block parameter

Extended Thinking for Complex Questions

The main Bizy chat handles diverse questions and often needs to invoke multiple tools or agents, then synthesize the results. RubyLLM's Extended Thinking gives reasoning models more time to deliberate, improving accuracy on these multi-step tasks.
Bizy::Chat uses extended thinking by default:
# app/lib/bizy/chat.rb (in build_chat method)
RubyLLM.chat(model: "claude-sonnet-4")
  .with_tools(*tools)
  .with_instructions(build_instructions)
  .with_thinking(effort: :medium)  # Enable for the main chat loop
Sub-agents configure their own thinking level based on their domain complexity:
# app/lib/bizy/agents/analytics.rb
class Bizy::Agents::Analytics < Bizy::Agents::Base
  def execute(question:)
    RubyLLM.chat(model: "claude-sonnet-4")
      .with_tools(*tools)
      .with_instructions(instructions)
      .with_thinking(effort: :high, budget: 10_000)  # Analytics needs deeper reasoning
      .ask(question).content
  end
end
Accessing thinking output (useful for debugging/logging):
response = chat.ask("Compare participation trends across departments")
response.thinking&.text    # The reasoning trace (if available)
response.content           # The final answer
Extended thinking adds latency but improves accuracy. The main chat uses :medium effort; specialized agents like Analytics can use :high for complex data synthesis.
Streaming with ActionCable

ActionCable provides WebSocket support in Rails, allowing us to push tokens to the frontend as they're generated.
# app/channels/bizy_channel.rb
class BizyChannel < ApplicationCable::Channel
  def subscribed
    stream_from "bizy:user:#{current_user.id}"
  end

  def self.stream_to(user, event:, data:)
    ActionCable.server.broadcast(
      "bizy:user:#{user.id}",
      { event: event, data: data }
    )
  end
end
# app/jobs/bizy/chat_job.rb
class Bizy::ChatJob < LLMJob
  def perform(user_id:, message:, conversation_id:, metadata: {})
    user = User.find(user_id)

    Bizy::Chat.execute(
      user: user,
      message: message,
      metadata: metadata,
      conversation_id: conversation_id
    ) do |chunk|
      BizyChannel.stream_to(user, event: "token", data: { text: chunk.content })
    end

    BizyChannel.stream_to(user, event: "complete", data: {})
  end
end
Background Job Processing

LLM calls take 10-60+ seconds. We use Async::Job (fiber-based) instead of Sidekiq (thread-based) to handle many concurrent requests efficiently. Fibers allow thousands of concurrent LLM calls to share a few connections, rather than blocking a thread per request.
# Base class for LLM jobs
class LLMJob < ApplicationJob
  self.queue_adapter = :async_job  # Uses fibers, not threads
end

# Regular jobs still use Sidekiq
class ImageProcessingJob < ApplicationJob
  # Uses default :sidekiq adapter
end
Graceful Interruption Handling

Deploys and dyno restarts (common on Heroku) can interrupt in-flight LLM requests. Rather than complex job recovery mechanisms, we handle this gracefully in the frontend.
The scenario:

User asks question, streaming starts
Deploy happens, worker process receives SIGTERM
ActionCable connection drops mid-stream
Frontend detects disconnect, shows friendly message
User resubmits their question

Frontend implementation:
// src/modules/bizy/hooks/use-bizy-channel.ts
const useBizyChannel = () => {
  const [isStreaming, setIsStreaming] = useState(false);
  const [error, setError] = useState<string | null>(null);

  useEffect(() => {
    const subscription = cable.subscriptions.create("BizyChannel", {
      received(data) {
        if (data.event === "token") {
          appendToken(data.text);
        } else if (data.event === "complete") {
          setIsStreaming(false);
        } else if (data.event === "error") {
          setError(data.message);
          setIsStreaming(false);
        }
      },

      disconnected() {
        if (isStreaming) {
          // Connection dropped while waiting for response
          setError("Connection interrupted. Please try your question again.");
          setIsStreaming(false);
        }
      },

      rejected() {
        setError("Unable to connect. Please refresh and try again.");
      }
    });

    return () => subscription.unsubscribe();
  }, []);

  return { isStreaming, error, clearError: () => setError(null) };
};
Why this approach:

Simple: No job persistence, idempotency keys, or recovery logic
User-friendly: Clear message, user has full context to retry
Rare occurrence: Deploys are infrequent relative to chat volume
No partial state: Incomplete responses aren't saved to history

Tradeoffs:

User must manually retry (minor inconvenience)
Partial responses are lost (but they were incomplete anyway)

Chat History Persistence

RubyLLM provides acts_as_chat for ActiveRecord-based persistence, but Bonusly uses Mongoid. We continue using our existing Bizy::ChatHistory model and enhance it with RubyLLM-specific fields.
Why not use RubyLLM's acts_as_chat:


Aspect
RubyLLM acts_as_chat
Bizy::ChatHistory (enhanced)


ORM
ActiveRecord only
Mongoid (our stack)


Setup
Generator creates migrations
Already exists


Bonusly fields
Would need to add
Already has context_type, feedback, etc.


Indexes
Would need to recreate
Already optimized for our queries


Tool calls
Separate ToolCall model
Embedded array (simpler)


Enhanced ChatHistory model:
# app/models/bizy/chat_history.rb
class Bizy::ChatHistory
  include ApplicationDocument

  # Existing fields
  field :user_id, type: BSON::ObjectId
  field :company_id, type: BSON::ObjectId
  field :conversation_id, type: String
  field :context_type, type: String
  field :context_id, type: BSON::ObjectId
  field :user_message, type: String
  field :bizy_response, type: String
  field :response_time_ms, type: Integer
  field :emotion, type: String

  # Feedback fields (existing)
  field :feedback_type, type: String
  field :feedback_category, type: String
  field :feedback_details, type: String

  # NEW: RubyLLM-specific fields
  field :model_id, type: String           # e.g., "claude-sonnet-4", "gpt-4o"
  field :provider, type: String           # e.g., "anthropic", "openai"
  field :input_tokens, type: Integer
  field :output_tokens, type: Integer
  field :thinking_text, type: String      # Extended thinking trace
  field :tool_calls, type: Array          # [{name: "GetOrgContext", arguments: {...}, result: {...}}]

  # REMOVE: OpenAI-specific field (no longer needed)
  # field :ai_response_id, type: String   # RubyLLM handles conversation state
end
Persistence is handled by Bizy::Chat
Multi-turn conversations:
Bizy::Chat automatically restores conversation context when a conversation_id is provided. See the restore_conversation_context method in the Bizy::Chat class above.
Pros of this approach:

Works with our Mongoid stack (no ActiveRecord required)
Preserves existing Bonusly-specific fields and indexes
Simpler embedded tool_calls (no separate model)
Feedback loop already implemented

Cons:

Manual persistence (RubyLLM doesn't auto-save)
Must manually rebuild conversation context for multi-turn
Won't get future RubyLLM persistence features automatically


5. Context Handling

How Context Flows

The frontend sends context metadata with each message:
# Controller receives
{
  message: "What did Jane and I discuss?",
  context_metadata: {
    meeting_partner_id: "abc123",  # Enables MeetingAgent
    context_type: "meeting"         # For logging
  }
}
Bizy::Chat uses this to determine which sub-agents are available:
# In Bizy::Chat#build_chat
SUB_AGENTS.each do |agent_class|
  if agent_class.available?(user: @user, metadata: @metadata)
    tools << agent_class.new(user: @user, metadata: @metadata)
  end
end
Context → Agent Mapping


Context
Available Agents
Why


No metadata
Core tools only
General questions


meeting_partner_id present
+ MeetingAgent
1:1 meeting questions


User is admin
+ AnalyticsAgent
Company analytics


Both
+ MeetingAgent + AnalyticsAgent
Full access


6. Implementation Plan

Approach: Build in Parallel, Then Cut Over

We build the new framework entirely in parallel to existing Bizy. The old system continues to work unchanged while we prove out the new approach. Once validated, we cut over via feature flag.
┌─────────────────────────────────────────────────────────────┐
│ Development Timeline                                         │
├─────────────────────────────────────────────────────────────┤
│                                                              │
│  Existing Bizy (unchanged)                                   │
│  ════════════════════════════════════════════════►           │
│                                                              │
│  New Framework (parallel build)                              │
│  ─────────────────────────────────► [Validate] ─► [Cutover]  │
│                                                              │
│  Phases 1-5: Build new system    Phase 6: Feature flag test  │
│  (old system untouched)          Phase 7: Remove old code    │
│                                                              │
└─────────────────────────────────────────────────────────────┘

Why parallel development:

Zero risk to production during development
Can compare outputs between old and new systems
Easy rollback (just disable feature flag)
No partial migrations or hybrid states

Phase 1: Infrastructure

Goal: Set up RubyLLM, ActionCable, and async jobs.
# Gemfile
gem 'ruby_llm'
gem 'async-job-adapter-active_job'

Configure RubyLLM with API keys
Create BizyChannel
Create LLMJob base class
Verify: RubyLLM.chat.ask("Hello") works in console

Existing Bizy: Unchanged
Phase 2: First Tool

Goal: Create one new tool to prove the pattern.
Create new GetOrgContext in the new location (does not modify existing tool):
# NEW: app/lib/bizy/tools/get_org_context.rb (RubyLLM-based)
# OLD: lib/bizy/tools/get_org_context.rb (still works, untouched)
Test in console:
tool = Bizy::Tools::GetOrgContext.new(requesting_user: user)
RubyLLM.chat.with_tools(tool).ask("Who is my manager?")
Existing Bizy: Unchanged
Phase 3: New Endpoint

Goal: Create new streaming endpoint (separate from existing).
# config/routes.rb
post 'bizy/chat', to: 'chat#create'                    # Existing (unchanged)
post 'bizy/chat/v2', to: 'chat#create_v2'              # New (feature flagged)
The new endpoint enqueues a job and returns immediately. Feature flag controls access.
Existing Bizy: Unchanged
Phase 4: Agent Structure

Goal: Create agent base class and top-level chat agent.

Create Bizy::Agents::Base (base class for sub-agents)
Create Bizy::Chat (top-level agent with execute method)
Register core tools and sub-agents in Bizy::Chat

Existing Bizy: Unchanged
Phase 5: Complete New System

Goal: Build out all tools and agents for the new framework.
Core tools (new implementations in app/lib/bizy/tools/):

GetOrgContext, GetRecognition, GetMilestones, GetHashtags

Agents (new in app/lib/bizy/agents/):

MeetingAgent (uses GetMeetingHistory, GetMeetingSummary, GetCheckinHistory)
AnalyticsAgent (uses GetParticipationTrends, GetRecognitionStats)

At this point, the new system is fully functional but behind a feature flag.
Existing Bizy: Unchanged, still serving all traffic
Phase 6: Validate + Cut Over

Goal: Prove the new system works, then switch traffic.

Internal testing: Team uses new system via feature flag
Shadow mode (optional): Run both systems, compare outputs
Gradual rollout: Enable for percentage of users via feature flag
Full cutover: Enable for all users

// Frontend feature flag
const BizyChat = () => {
  const useNewFramework = useFeatureFlag('bizy-v2');

  if (useNewFramework) {
    return <StreamingBizyChat />;  // New ActionCable-based
  }
  return <LegacyBizyChat />;       // Existing sync
};
Phase 7: Remove Old Code

Goal: Clean up after successful cutover.
Only after the new system is proven in production:

Remove lib/bizy/ai_driver.rb
Remove lib/bizy/base_tool.rb
Remove lib/bizy/tools/*.rb (old MCP tools)
Remove MCP controller (if not used elsewhere)
Remove old routes and feature flag checks

Rollback Plan

At every phase, existing Bizy continues to work. Rollback is simple:


Phase
Rollback


1-5
Nothing to rollback (old system still serving traffic)


6
Disable feature flag → 100% traffic to old system


7
Restore deleted code from git (should rarely be needed)


Success Criteria


New system fully functional behind feature flag
Internal team validated new system behavior
Streaming responses working end-to-end
Performance equal or better than old system
Feature flag rollout completed without issues
Old code removed
Async::Job handling LLM calls
Feature flag shows no regressions
Old MCP infrastructure removed


7. Advanced Topics

MCP Integration

RubyLLM can both consume MCP tools (as a client) and our agents can be exposed via MCP (as a server). This provides flexibility for integration with external systems.
Consuming MCP Tools (Client)

The ruby_llm-mcp gem provides full MCP client support for RubyLLM. This is useful if we want RubyLLM agents to call tools hosted on existing MCP servers (like our current Bizy/BonuslyGPT infrastructure during migration).
require 'ruby_llm/mcp'

mcp_client = RubyLLM::MCP.client(
  name: "bizy-mcp",
  transport_type: :streamable,
  config: { url: ENV['BIZY_MCP_URL'] }
)

chat = RubyLLM.chat.with_tools(*mcp_client.tools)
Use cases:

Gradual migration: New agents use RubyLLM but can still call existing MCP tools
External MCP servers: Connect to third-party MCP servers (filesystem, databases, etc.)
Hybrid architecture: Mix local tools with remote MCP tools

Exposing Agents via MCP (Server)

If we want external systems (other AI platforms, ChatGPT plugins, etc.) to call our agents, we can expose them as an MCP server. Our existing Mcp::BizyController pattern already does this.
Example pattern (could be extended for new agents):
# app/controllers/mcp/agents_controller.rb
module Mcp
  class AgentsController < ApplicationController
    # JSON-RPC 2.0 endpoint for MCP
    def handle
      case json_rpc_method
      when "tools/list"
        render_tools_list
      when "tools/call"
        result = execute_tool(params[:name], params[:arguments])
        render json: { result: result }
      end
    end

    private

    def render_tools_list
      render json: {
        tools: [
          {
            name: "ask_bizy",
            description: "Ask Bizy a question about Bonusly",
            inputSchema: {
              type: "object",
              properties: {
                question: { type: "string", description: "The question to ask" }
              },
              required: ["question"]
            }
          }
        ]
      }
    end

    def execute_tool(name, arguments)
      case name
      when "ask_bizy"
        response = Bizy::Chat.execute(
          user: current_user,
          message: arguments[:question],
          metadata: {}
        )
        response.content
      end
    end
  end
end
Use cases:

ChatGPT/Claude plugins: Let external AI access Bonusly data
Enterprise integrations: Partner systems can query our agents
A2A protocol: If we adopt A2A, MCP servers are a building block

When to Use Each Approach


Scenario
Approach


New agent with new tools
Native RubyLLM tools (simplest)


New agent + existing MCP tools
ruby_llm-mcp client


Expose agent to external systems
MCP server endpoint


Gradual migration from MCP
Hybrid (ruby_llm-mcp + native)


Future: SSE for Non-Puma Environments

ActionCable is the right choice for Puma. If we later adopt an async-native server like Falcon, SSE becomes viable for streaming without blocking threads.
                    ┌─────────────┐
                    │   Falcon    │
                    │  (async)    │
                    └──────┬──────┘
                           │
                    ┌──────▼──────┐
                    │  SSE Stream │
                    └──────┬──────┘
                           │
                    ┌──────▼──────┐
                    │   Client    │
                    └─────────────┘

Falcon uses Ruby's fiber-based async model, allowing thousands of concurrent connections on a single thread. This pairs naturally with RubyLLM's streaming and async capabilities.
See RubyLLM Async documentation for details on fiber-based concurrency and Falcon integration.
Nested Agents

Agents can use other agents as tools:
class Bizy::Agents::ExecutiveReport < Bizy::Agents::Base
  description "Generate executive reports"

  def tools
    [
      Bizy::Agents::Analytics.new(user: @user, metadata: @metadata),
      Bizy::Tools::GetTopPerformers.new(company: @user.company),
    ]
  end
end

8. FAQ

Why go back to function calling from MCP?

Why MCP exists: MCP evolved as a standard because every LLM provider (OpenAI, Anthropic, Google, etc.) had their own custom mechanism for function calling. MCP provides a way to "build once, use anywhere" - define tools once and expose them to any AI system, regardless of provider.
Why we don't need it for internal tools: RubyLLM handles the first part for us. It provides a unified abstraction over all providers' function calling mechanisms. We write a tool once using RubyLLM's DSL, and it works with OpenAI, Anthropic, Gemini, or any other supported provider.
# This tool works with ANY provider - RubyLLM handles the translation
class Bizy::Tools::GetOrgContext < RubyLLM::Tool
  description "Get organizational context for a user"
  param :target_user, desc: "User ID, email, or name"

  def execute(target_user:)
    # Same code works whether we're using Claude, GPT, or Gemini
  end
end
What about exposing tools externally? We don't have a need to expose these tools via MCP yet. If we do in the future (for ChatGPT plugins, partner integrations, etc.), it's easy to add on later - see "Exposing Agents via MCP" in Advanced Topics.
What we gain by removing MCP internally:

No network round-trip for every tool call (less latency)
No MCP server process to maintain
Tools are just Ruby classes - easy to test and debug
Simpler architecture overall

Where does orchestration happen?

Orchestration happens in three places:

Bizy::Chat (setup-time): Assembles available tools and sub-agents based on user permissions and context metadata.

# Bizy::Chat#build_chat decides WHAT tools/agents are available
def build_chat
  tools = CORE_TOOLS.map { |klass| klass.new(requesting_user: @user) }

  SUB_AGENTS.each do |agent_class|
    if agent_class.available?(user: @user, metadata: @metadata)
      tools << agent_class.new(user: @user, metadata: @metadata)
    end
  end

  RubyLLM.chat(model: "claude-sonnet-4").with_tools(*tools)
end

The LLM (runtime): Decides WHICH tools to call based on the user's question and tool descriptions. This is the "orchestrator" - it reads the question, looks at available tools, and decides what to invoke.
Sub-agents (nested runtime): When the LLM invokes an agent-as-tool, that agent runs its own chat with its own tools, creating a nested orchestration layer.

User Question
     ↓
Bizy::Chat.execute (assembles tools/agents)
     ↓
Main LLM (decides: "this is a meeting question")
     ↓
MeetingAgent.execute(question:)
     ↓
Sub-LLM (uses meeting-specific tools)
     ↓
Response bubbles back up

How does the chat agent know which sub-agent to invoke?

The LLM decides based on two sources of guidance:
1. Tool/Agent Descriptions

Each agent has a description that tells the LLM when to use it:
class Bizy::Agents::Meeting < Bizy::Agents::Base
  description "Ask questions about 1:1 meetings, past discussions, and meeting prep"
  # ...
end

class Bizy::Agents::Analytics < Bizy::Agents::Base
  description "Get company analytics, participation trends, and recognition statistics"
  # ...
end
When you ask "What did Jane and I discuss last week?", the LLM sees:

GetOrgContext: "Get organizational context for a user" - not relevant
MeetingAgent: "Ask questions about 1:1 meetings, past discussions..." - matches!
AnalyticsAgent: "Get company analytics, participation trends..." - not relevant

The LLM then calls MeetingAgent.execute(question: "What did Jane and I discuss last week?").
2. Main Chat System Prompt

The Bizy::Chat system prompt provides general guidance without duplicating tool descriptions:
class Bizy::Chat
  INSTRUCTIONS = <<~PROMPT
    You are Bizy, a helpful AI assistant for Bonusly.

    You have access to tools for looking up information and specialized agents
    for specific domains. When a question matches a specialized agent's domain,
    prefer delegating to that agent rather than answering directly.

    Be helpful, concise, and accurate. If you don't have enough information
    to answer a question, say so.
  PROMPT
end
The system prompt is intentionally minimal - it sets persona and general behavior, while tool descriptions do the heavy lifting for routing decisions. This avoids duplication and keeps things maintainable.
Key points:

Tool descriptions are the primary mechanism for routing decisions
System prompt provides persona and general preferences
No need to duplicate agent info in the system prompt
The LLM may call multiple tools/agents for complex questions
If no agent matches, core tools handle it directly

Writing effective descriptions:
Per Anthropic's best practices, aim for at least 3-4 sentences per tool description. Include:

What the tool does
When it should be used (and when it shouldn't)
What each parameter means
Any important caveats or limitations

Including examples in descriptions can improve routing accuracy:
class Bizy::Agents::Meeting < Bizy::Agents::Base
  description <<~DESC
    Ask questions about 1:1 meetings and past discussions with a specific person.

    Use this agent when the user asks about:
    - What was discussed in previous meetings ("What did we talk about last week?")
    - Meeting history and summaries ("Give me a recap of my meetings with Jane")
    - Preparing for upcoming 1:1s ("What should I discuss with my manager?")

    Do NOT use for general questions about the user's calendar or scheduling.
  DESC
end
Token considerations: Tool descriptions count toward input tokens. A detailed description (~100-200 tokens) is a small cost for better routing accuracy. There's no hard limit, but be mindful if you have many tools.
Why expose agents as tools? Isn't that conflating concepts?

It might seem confusing at first - aren't agents and tools different things? Here's why the pattern works:
From the LLM's perspective, there's no difference. When the main Bizy chat sees its available "tools," it doesn't know (or care) whether GetOrgContext fetches data from a database or whether MeetingAgent runs an entire sub-conversation. Both are just: "call this with these parameters, get a result back."
The key insight: An agent is just a tool that happens to use an LLM internally.
Regular tool:        input → Ruby code → output
Agent-as-tool:       input → Ruby code → LLM + more tools → output

Why this is better than alternatives:


Alternative
Problem


Separate routing layer
Extra code to maintain, another place for bugs, duplicates what LLMs do well (understanding intent)


All tools flat
Main chat needs to know about every tool in the system, bloated context, harder to maintain


Explicit agent handoff
Requires the LLM to understand a special "handoff" concept, more complex prompting


The agent-as-tool pattern:

Uses the LLM's natural tool-calling ability for routing
Encapsulates complexity (the main chat doesn't know MeetingAgent uses 3 sub-tools)
Composes naturally (agents can use other agents)
Follows RubyLLM's recommended pattern (see agentic workflows)

Conceptual clarity: Think of it like delegation in an organization. When you ask your assistant a question, they might answer directly (tool) or say "let me check with the finance team and get back to you" (agent-as-tool). Either way, you just asked a question and got an answer.
What other tools did we consider besides RubyLLM?

We evaluated several Ruby AI frameworks before choosing RubyLLM:
LangChain.rb (https://github.com/patterns-ai-core/langchainrb)

Ruby port of Python LangChain. 2k+ GitHub stars, 15+ providers, production-ready.
Why not: Heavier abstraction, more "batteries included" than we need. Python-first patterns translated to Ruby.

Raix (https://github.com/OlympiaAI/raix-rails)

Ruby AI eXtensions for Rails by OlympiaAI.
Why not: Smaller community (44 stars), less mature tooling ecosystem.

BoxCars (https://www.boxcars.ai/)

Rails-focused AI gem for text-to-action features.
Why not: More opinionated about workflow patterns, less flexibility for our agent-as-tool approach.

OmniAI (https://rubygems.org/gems/omniai)

Unified interface for multiple AI providers.
Why not: Simpler scope - primarily a provider abstraction, less support for agentic patterns.

Sublayer (https://docs.sublayer.com/)

Model-agnostic AI agent framework with DSL.
Why not: Different architectural approach (Generators/Actions), less Rails integration.

Chatwoot AI Agents (https://github.com/chatwoot/ai-agents)

Ruby AI Agents SDK inspired by OpenAI's Agents SDK. Built on top of RubyLLM. Features multi-agent orchestration, seamless handoffs, and shared context.
Why not: Actually a strong contender! However, it's a layer on top of RubyLLM. Using RubyLLM directly gives us more control and fewer dependencies. Worth revisiting if we need more complex handoff patterns.

Active Agent (https://www.activeagents.ai/)

Rails-native AI framework following MVC conventions. Treats agents like controllers, prompts like views. Includes Action Prompt, Generation Provider, and Queued Generation modules. Integrates with background jobs and streaming.
Why not: Interesting "agents as controllers" approach, but different from our agent-as-tool pattern. Less mature ecosystem. Worth watching as it develops.

Direct SDKs (OpenAI/Anthropic Ruby gems)

Use provider SDKs directly without abstraction.
Why not: Provider lock-in, must implement tool calling ourselves.

Why RubyLLM won:

Right level of abstraction: Not too heavy (LangChain) or too light (OmniAI/direct SDKs)
Native Rails integration: Built-in generators, acts_as_chat (even if we use our own persistence)
Agentic patterns documented: Clear guidance on multi-agent orchestration
Active development: Regular releases, responsive maintainer, growing community
Provider flexibility: Easy to swap models or use different providers for different tasks
Async support: Built-in fiber-based concurrency for efficient LLM operations

Trade-off acknowledged: LangChain.rb has a larger community and more built-in integrations (vector stores, document loaders, etc.). If we needed those features heavily, it might be worth the heavier abstraction. For our use case (chat + tools + agents), RubyLLM is the better fit.

9. Reference

RubyLLM Documentation


Topic
Link
When to Read


Getting Started
rubyllm.com
First setup


Tool Definition
rubyllm.com/tools
Creating tools


Extended Thinking
rubyllm.com/thinking
Complex reasoning tasks


Streaming
rubyllm.com/streaming
Real-time responses


Async/Fibers
rubyllm.com/async
Background job setup


Rails Integration
rubyllm.com/rails
Persistence with acts_as_chat


Multi-Agent Patterns
rubyllm.com/agentic-workflows
Agent-as-tool pattern


Structured Output
rubyllm.com/chat
Schema-based responses


Related Libraries


ruby_llm-mcp - MCP client for RubyLLM (for migration period)
ruby_llm-schema - DSL for defining JSON schemas (bundled with RubyLLM)
async-job - Fiber-based ActiveJob adapter

Rails Guides


ActionCable Overview - WebSocket fundamentals
ActiveJob Basics - Background job processing

Future Considerations


Issue #326: Handoff Tools - Native agent delegation (in development)

Current Bizy Files


File
Purpose


lib/bizy/ai_driver.rb
Main orchestrator (to be replaced)


lib/bizy/base_tool.rb
MCP tool base class


lib/bizy/tools/*.rb
Current tool implementations


app/controllers/mcp/bizy_controller.rb
MCP server endpoint


Glossary


Term
Definition


Tool
A function the AI can call to get information or perform actions


Agent
A specialized AI assistant with its own tools, persona, and instructions


Agent-as-Tool
Pattern where specialized agents are invoked as tools by a coordinator


Extended Thinking
Giving reasoning models more time/budget to deliberate on complex tasks


MCP
Model Context Protocol - JSON-RPC standard for AI tool execution


ActionCable
Rails WebSocket framework for real-time bidirectional communication


RubyLLM
Provider-agnostic Ruby library for LLM interactions


Async::Job
Fiber-based job processor optimized for I/O-bound work like LLM calls


Fiber
Lightweight concurrency primitive; many can run on a single thread
Goal	Why It Matters
Multi-surface	One framework powers Bizy everywhere it appears
Composable	Add new capabilities by adding agents, not modifying core code
Streaming	Users see responses as they're generated
Non-blocking	Long LLM calls don't consume web server threads
Provider-agnostic	Use the best model for each task (Claude for analysis, GPT for conversation)
Aspect	Current	New
LLM Client	OpenAI Ruby SDK	RubyLLM (any provider)
Tool Execution	MCP server (HTTP calls)	Direct function calling
Response Delivery	Synchronous JSON	ActionCable streaming
Thread Model	Blocks Puma thread	Async background jobs
Extensibility	Modify AiDriver	Add new agent class
Pattern	What It Is	Best For
Function Calling	Tools defined in API request, app executes them	Simple integrations, 2-5 tools
MCP	Tools on separate server, discovered/executed via JSON-RPC	Shared tooling, security isolation
A2A	Open standard for inter-agent communication across organizations	Enterprise ecosystems, multi-vendor
ACP	Agent Communication Protocol (now merged into A2A)	Deprecated - use A2A
Agent-as-Tool	Specialized agents invoked as tools by a coordinator	Multi-agent orchestration within one system
System	Purpose	How It Works
Bizy	User assistant for meetings/org	MCP tools via OpenAI Responses API
KYB	Compliance verification	Direct API calls, structured output
BonuslyGPT	Internal support	MCP tools + vector search
Aspect	Build Our Own	Use RubyLLM
Development time	High - rebuild provider APIs, tool calling, streaming	Low - already built and tested
Provider support	Must implement each (OpenAI, Anthropic, etc.)	15+ providers out of the box
Tool calling	Must handle JSON schema, validation, execution	Native DSL, automatic schema generation
Streaming	Must handle SSE/WebSocket per provider	Consistent block-based API
Maintenance	Team must track API changes for each provider	Community maintains compatibility
Testing	Must build test infrastructure	Built-in test helpers and mocking
Extended thinking	Must implement per-provider	Unified API across Claude/Gemini
Async/concurrency	Must build fiber/thread management	Built-in Async::Job integration
Flexibility	Full control over implementation	Constrained by library design
Lock-in risk	None	Tied to RubyLLM's abstraction
Feature	Benefit
Provider Agnostic	Same API for OpenAI, Anthropic, Gemini, etc. Use the best model for each task.
Native Tools	First-class tool support without MCP overhead
Streaming	Built-in streaming with block syntax
Rails Integration	`acts_as_chat` for persistence (ActiveRecord)
Async Support	Fiber-based concurrency for parallel operations
Aspect	RubyLLM acts_as_chat	Bizy::ChatHistory (enhanced)
ORM	ActiveRecord only	Mongoid (our stack)
Setup	Generator creates migrations	Already exists
Bonusly fields	Would need to add	Already has context_type, feedback, etc.
Indexes	Would need to recreate	Already optimized for our queries
Tool calls	Separate ToolCall model	Embedded array (simpler)
Context	Available Agents	Why
No metadata	Core tools only	General questions
`meeting_partner_id` present	+ MeetingAgent	1:1 meeting questions
User is admin	+ AnalyticsAgent	Company analytics
Both	+ MeetingAgent + AnalyticsAgent	Full access
Phase	Rollback
1-5	Nothing to rollback (old system still serving traffic)
6	Disable feature flag → 100% traffic to old system
7	Restore deleted code from git (should rarely be needed)
Scenario	Approach
New agent with new tools	Native RubyLLM tools (simplest)
New agent + existing MCP tools	ruby_llm-mcp client
Expose agent to external systems	MCP server endpoint
Gradual migration from MCP	Hybrid (ruby_llm-mcp + native)