Skip to content

Instantly share code, notes, and snippets.

@swalke16
Last active February 3, 2026 22:47
Show Gist options
  • Select an option

  • Save swalke16/928a34c68a6bf3d181e38ffc7e1735da to your computer and use it in GitHub Desktop.

Select an option

Save swalke16/928a34c68a6bf3d181e38ffc7e1735da to your computer and use it in GitHub Desktop.
Composable Agent Framework for Bonusly
name overview todos isProject
Composable Agent Framework
Design a composable chat-based agent framework using RubyLLM, with ActionCable streaming, that can be surfaced across multiple product entry points while maintaining traceability.
id content status
rubyllm-setup
Add RubyLLM gem and configure OpenAI/Anthropic providers
pending
id content status
actioncable-channel
Create BizyChannel for streaming responses to users
pending
id content status
agent-base
Create Bizy::Agents::Base class extending RubyLLM::Tool
pending
id content status
bizy-chat
Build Bizy::Chat as top-level agent with core tools and sub-agents
pending
id content status
tool-migration
Convert existing MCP tools to RubyLLM tool classes
pending
id content status
meeting-agent
Create MeetingAgent for 1:1 meeting questions
pending
id content status
analytics-agent
Create AnalyticsAgent for admin analytics questions
pending
id content status
frontend-streaming
Update Kaleidoscope to consume ActionCable stream
pending
false

Composable Agent Framework for Bonusly

1. Overview

What We're Building

A chat-based AI assistant framework that:

  • Works across multiple product surfaces (meetings, admin, general chat)
  • Routes questions to specialized "agents" based on context and permissions
  • Streams responses in real-time without blocking web server threads
  • Is built on proven patterns using the RubyLLM library

Goals

Goal Why It Matters
Multi-surface One framework powers Bizy everywhere it appears
Composable Add new capabilities by adding agents, not modifying core code
Streaming Users see responses as they're generated
Non-blocking Long LLM calls don't consume web server threads
Provider-agnostic Use the best model for each task (Claude for analysis, GPT for conversation)

What Changes from Current Bizy

Aspect Current New
LLM Client OpenAI Ruby SDK RubyLLM (any provider)
Tool Execution MCP server (HTTP calls) Direct function calling
Response Delivery Synchronous JSON ActionCable streaming
Thread Model Blocks Puma thread Async background jobs
Extensibility Modify AiDriver Add new agent class

2. Core Concepts

What is a "Tool"?

A tool is a function the AI can call to get information or perform actions. When you ask "Who is my manager?", the AI calls a tool to look that up.

class Bizy::Tools::GetOrgContext < RubyLLM::Tool
  description "Get organizational context for a user"

  param :target_user, desc: "User ID, email, or name"

  def execute(target_user:)
    # Called by the AI when it needs org info
    target = resolve_user(target_user)
    {
      user: format_user(target),
      manager: format_manager(target),
      direct_reports: format_direct_reports(target)
    }
  end
end

The AI sees the description and param info, decides when to use the tool, and receives the return value.

What is an "Agent"?

An agent is a specialized AI assistant with its own tools and personality. Think of it as a department expert you can delegate questions to.

In our framework, agents ARE tools. The main Bizy chat can invoke a specialized agent just like any other tool:

class Bizy::Agents::Meeting < Bizy::Agents::Base
  description "Ask questions about 1:1 meetings and past discussions"

  def tools
    [GetMeetingHistory.new(...), GetMeetingSummary.new(...)]
  end

  def instructions
    "You are a meeting assistant. Help with 1:1 prep and history."
  end
end

The Agent-as-Tool Pattern

Instead of one chat with many tools, we have a coordinator chat with a few core tools plus specialized agents:

┌─────────────────────────────────────────────────────────────┐
│ Main Bizy Chat                                               │
│   Core tools: GetOrgContext, GetRecognition, GetMilestones   │
│   Agents:     MeetingAgent, AnalyticsAgent                   │
│                                                              │
│   User: "What did Jane and I discuss last week?"             │
│         ↓                                                    │
│   Bizy decides: This is a meeting question → invoke agent    │
│         ↓                                                    │
│   ┌─────────────────────────────────────────┐                │
│   │ MeetingAgent                             │                │
│   │   Tools: GetMeetingHistory, GetSummary   │                │
│   │   Runs its own AI chat                   │                │
│   │   Returns: "In your Jan 15 meeting..."   │                │
│   └─────────────────────────────────────────┘                │
└─────────────────────────────────────────────────────────────┘

Why this is better:

  • The main chat doesn't need to know about every tool in the system
  • Each agent encapsulates its domain completely
  • Adding a new capability = adding a new agent class
  • Agents can be nested (an executive report agent could use the analytics agent)

Agent Orchestration Patterns Compared

Before choosing an architecture, it's worth understanding the landscape of agent orchestration patterns:

Pattern What It Is Best For
Function Calling Tools defined in API request, app executes them Simple integrations, 2-5 tools
MCP Tools on separate server, discovered/executed via JSON-RPC Shared tooling, security isolation
A2A Open standard for inter-agent communication across organizations Enterprise ecosystems, multi-vendor
ACP Agent Communication Protocol (now merged into A2A) Deprecated - use A2A
Agent-as-Tool Specialized agents invoked as tools by a coordinator Multi-agent orchestration within one system

Function Calling (Direct)

User → App → LLM API (with tool definitions) → App executes tool → LLM API → Response
  • Pros: Simple, full control, no network hops for tool discovery
  • Cons: Provider-specific APIs when using raw SDKs (mitigated by RubyLLM)
  • Used by: RubyLLM's native tool pattern

Note: RubyLLM eliminates provider lock-in by providing a unified API. Tools written once work across OpenAI, Anthropic, Gemini, Mistral, and others.

MCP (Model Context Protocol)

User → App → LLM API → MCP Server (JSON-RPC) → Tool execution → LLM API → Response
  • Pros: Security isolation, shared tooling, credential separation
  • Cons: Extra network hop, more infrastructure
  • Used by: Current Bizy and BonuslyGPT implementations

A2A (Agent-to-Agent Protocol)

Agent A → HTTP/JSON-RPC → Agent B (with Agent Card discovery)
  • Pros: Cross-organization interop, standardized discovery
  • Cons: Overkill for internal agents, added complexity
  • Best for: Enterprise agent marketplaces, not internal frameworks

Agent-as-Tool Pattern

Coordinator Agent → Specialized Agent (via tool call) → Result → Coordinator continues
  • Pros: Composable, each agent focused, natural tool semantics
  • Cons: Deeper call stacks, potential for loops
  • Used by: RubyLLM's multi-agent patterns

Why We Chose Agent-as-Tool

Agent-as-Tool with direct function calling (via RubyLLM) is the right fit because:

  1. All agents are internal to Bonusly - no need for A2A's cross-org interop
  2. RubyLLM handles tool execution natively without MCP's network overhead
  3. Specialized agents can be tools that the coordinator invokes
  4. Maintains simplicity while enabling composition

3. Current State

Existing AI Systems at Bonusly

System Purpose How It Works
Bizy User assistant for meetings/org MCP tools via OpenAI Responses API
KYB Compliance verification Direct API calls, structured output
BonuslyGPT Internal support MCP tools + vector search

Current Bizy Architecture

User → ChatController → AiDriver → OpenAI API → MCP Server → Tools
                            ↓
                       Response (sync)

Issues with current approach:

  1. Blocks Puma threads: Each chat request holds a thread for 10-60 seconds
  2. MCP overhead: Tool calls require HTTP round-trips to MCP server
  3. Single agent: All logic in AiDriver, hard to extend
  4. OpenAI lock-in: Tied to OpenAI's specific API

Key Files in Current Implementation

  • lib/bizy/ai_driver.rb - Main orchestrator (538 lines)
  • lib/bizy/base_tool.rb - Tool base class for MCP
  • lib/bizy/tools/*.rb - Individual tools (7 tools)
  • app/controllers/mcp/bizy_controller.rb - MCP server endpoint

4. Architecture

High-Level Flow

┌─────────────────────────────────────────────────────────────┐
│ New Architecture                                             │
├─────────────────────────────────────────────────────────────┤
│                                                              │
│  Frontend (Kaleidoscope)                                     │
│      │                                                       │
│      │ 1. POST /api/v2/bizy/chat                             │
│      │ 2. Subscribe to BizyChannel                           │
│      ▼                                                       │
│  ChatController                                              │
│      │                                                       │
│      │ 3. Enqueue Bizy::ChatJob                              │
│      │    (Puma thread freed immediately)                    │
│      ▼                                                       │
│  Bizy::ChatJob (Async::Job)                                  │
│      │                                                       │
│      │ 4. Bizy::Chat.execute(user:, message:, metadata:)      │
│      ▼                                                       │
│  RubyLLM Chat with Tools + Agents                            │
│      │                                                       │
│      │ 5. Stream tokens via ActionCable                      │
│      ▼                                                       │
│  BizyChannel → Frontend                                      │
│                                                              │
└─────────────────────────────────────────────────────────────┘

Why RubyLLM vs Building Our Own

Before adopting RubyLLM, we should consider whether to continue building on our current custom implementation or adopt a library.

Current Bizy Implementation:

  • Custom OpenAI Responses API integration
  • Custom MCP implementation for tool calling
  • Custom streaming (SSE-based, has Puma threading issues)
  • Custom conversation management

Comparison:

Aspect Build Our Own Use RubyLLM
Development time High - rebuild provider APIs, tool calling, streaming Low - already built and tested
Provider support Must implement each (OpenAI, Anthropic, etc.) 15+ providers out of the box
Tool calling Must handle JSON schema, validation, execution Native DSL, automatic schema generation
Streaming Must handle SSE/WebSocket per provider Consistent block-based API
Maintenance Team must track API changes for each provider Community maintains compatibility
Testing Must build test infrastructure Built-in test helpers and mocking
Extended thinking Must implement per-provider Unified API across Claude/Gemini
Async/concurrency Must build fiber/thread management Built-in Async::Job integration
Flexibility Full control over implementation Constrained by library design
Lock-in risk None Tied to RubyLLM's abstraction

Pros of RubyLLM:

  • Proven: Active community, battle-tested in production
  • Speed: Skip months of infrastructure work, focus on Bonusly-specific features
  • Multi-model: Use Claude for complex reasoning, GPT for quick responses, Gemini for long context
  • Future-proof: New models (GPT-5, Claude 4) supported by updating the gem
  • Tool ecosystem: Can integrate with MCP servers via ruby_llm-mcp
  • Well-documented: Extensive documentation with guides for common agentic patterns like multi-agent orchestration, tool composition, and streaming

Cons of RubyLLM:

  • Abstraction leaks: Edge cases may require workarounds or PRs
  • Dependency risk: Library could become unmaintained (mitigated: active development, MIT license)
  • Learning curve: Team needs to learn RubyLLM patterns
  • ActiveRecord assumptions: Some features (like acts_as_chat) assume ActiveRecord; we use Mongoid

Recommendation: Adopt RubyLLM. The development time savings are significant, and the library's design aligns well with our needs. The Mongoid limitation for chat persistence is easily worked around with our existing Bizy::ChatHistory model (see "Chat History Persistence" section).

RubyLLM Foundation

RubyLLM provides:

Feature Benefit
Provider Agnostic Same API for OpenAI, Anthropic, Gemini, etc. Use the best model for each task.
Native Tools First-class tool support without MCP overhead
Streaming Built-in streaming with block syntax
Rails Integration acts_as_chat for persistence (ActiveRecord)
Async Support Fiber-based concurrency for parallel operations

Tool Definition

RubyLLM tools use a DSL for defining input parameters. RubyLLM v1.9+ provides a params block DSL for complex schemas (nested objects, arrays, enums), while simpler tools can use the param helper.

Simple tool with param helper:

# app/lib/bizy/tools/get_org_context.rb
class Bizy::Tools::GetOrgContext < RubyLLM::Tool
  description "Get organizational context for a user"

  param :target_user, desc: "User ID, email, or name"

  def initialize(requesting_user:)
    @requesting_user = requesting_user
  end

  def execute(target_user:)
    target = resolve_user(target_user)

    {
      user: format_user(target),
      manager: format_manager(target),
      direct_reports: format_direct_reports(target),
      relationship_to_you: build_relationship(target)
    }
  end

  private

  def resolve_user(identifier)
    return @requesting_user if identifier.blank?
    @requesting_user.company.users.active.find_by_identifier(identifier)
  end

  def format_user(user)
    { id: user.id.to_s, name: user.display_name, email: user.email }
  end
end

Complex tool with params DSL (v1.9+):

For tools with structured inputs, use the params block for nested objects, arrays, and enums:

class Bizy::Tools::GetParticipationTrends < RubyLLM::Tool
  description "Get participation trends for the company"

  params do
    string :group_by, description: "Property to group by (department, location, team)"
    integer :months, description: "Number of months to analyze", required: false
    object :filters, description: "Optional filters to apply", required: false do
      array :departments, of: :string, description: "Limit to specific departments"
      enum :status, %w[active inactive all], description: "User status filter"
    end
  end

  def initialize(company:)
    @company = company
  end

  def execute(group_by:, months: 12, filters: nil)
    input = Analytics::Queries::GetGivingAndReceivingParticipationData::Input.new(
      company_id: @company.id,
      end_time: Time.current,
      custom_property_group: group_by
    )

    result = Analytics::Queries::GetGivingAndReceivingParticipationData.call(input)
    format_for_llm(result)
  end
end

Structured Output with Schemas

RubyLLM supports structured output via with_schema, ensuring the LLM returns valid JSON matching a defined schema. This is useful when agents need to return structured data rather than free-form text.

Defining output schemas with RubyLLM::Schema:

# app/lib/bizy/schemas/meeting_summary.rb
class Bizy::Schemas::MeetingSummary < RubyLLM::Schema
  string :summary, description: "Brief summary of the meeting"
  array :key_topics, of: :string, description: "Main topics discussed"
  array :action_items, description: "Action items from the meeting" do
    string :task, description: "The task to complete"
    string :owner, description: "Person responsible"
    string :due_date, description: "When it's due", required: false
  end
  string :next_steps, description: "Recommended next steps", required: false
end

Using schemas in agent responses:

class Bizy::Agents::Meeting < Bizy::Agents::Base
  def execute(question:)
    RubyLLM.chat(model: "claude-sonnet-4")
      .with_tools(*tools)
      .with_instructions(instructions)
      .with_schema(Bizy::Schemas::MeetingSummary)
      .ask(question)
  end
end

When to use schemas:

Use schemas whenever the caller needs well-structured output - the LLM will conform its response to the schema regardless of how the question is phrased.

  • Analytics agents returning data for charts/tables
  • Meeting agents returning summaries with action items
  • Any agent whose output will be parsed or displayed programmatically
  • Ensuring consistent response formats for frontend rendering

Agent Base Class

# app/lib/bizy/agents/base.rb
class Bizy::Agents::Base < RubyLLM::Tool
  # Agents are tools that run their own sub-chat

  param :question, desc: "The question to ask this specialized agent"

  def initialize(user:, metadata: {})
    @user = user
    @metadata = metadata
  end

  def execute(question:)
    chat = RubyLLM.chat(model: "gpt-4o")
      .with_tools(*tools)
      .with_instructions(instructions)

    chat.ask(question).content
  end

  private

  def tools
    raise NotImplementedError
  end

  def instructions
    raise NotImplementedError
  end
end

Example Agent: MeetingAgent

# app/lib/bizy/agents/meeting.rb
class Bizy::Agents::Meeting < Bizy::Agents::Base
  description "Ask questions about 1:1 meetings and past discussions"

  def self.available?(user:, metadata:)
    metadata[:meeting_partner_id].present?
  end

  def initialize(user:, metadata:)
    super
    @partner_id = metadata[:meeting_partner_id]
    @partner = User.find(@partner_id)
  end

  private

  def tools
    [
      Bizy::Tools::GetMeetingHistory.new(
        requesting_user: @user,
        partner_id: @partner_id
      ),
      Bizy::Tools::GetMeetingSummary.new(
        requesting_user: @user,
        partner_id: @partner_id
      ),
    ]
  end

  def instructions
    <<~PROMPT
      You are a meeting assistant for 1:1 meetings.

      Context:
      - Current user: #{@user.display_name}
      - Meeting partner: #{@partner.display_name}

      Use get_meeting_summary for recent meetings.
      Use get_meeting_history for full history.
    PROMPT
  end
end

Bizy::Chat - The Top-Level Agent

Bizy::Chat is the main entry point - a top-level agent that orchestrates tools and sub-agents. Unlike sub-agents (which extend RubyLLM::Tool), Bizy::Chat is invoked by application code, not by an LLM.

# app/lib/bizy/chat.rb
class Bizy::Chat
  INSTRUCTIONS = <<~PROMPT
    You are Bizy, a helpful AI assistant for Bonusly.

    You have access to tools for looking up information and specialized agents
    for specific domains. When a question matches a specialized agent's domain,
    prefer delegating to that agent rather than answering directly.

    Be helpful, concise, and accurate. If you don't have enough information
    to answer a question, say so.
  PROMPT

  # Core tools (always available)
  CORE_TOOLS = [
    Bizy::Tools::GetOrgContext,
    Bizy::Tools::GetRecognition,
    Bizy::Tools::GetMilestones,
  ].freeze

  # Sub-agents (conditionally available based on context)
  SUB_AGENTS = [
    Bizy::Agents::Meeting,
    Bizy::Agents::Analytics,
  ].freeze

  def self.execute(user:, message:, metadata: {}, conversation_id: nil, &block)
    new(user: user, metadata: metadata, conversation_id: conversation_id)
      .execute(message, &block)
  end

  def initialize(user:, metadata: {}, conversation_id: nil)
    @user = user
    @metadata = metadata
    @conversation_id = conversation_id
  end

  def execute(message, &block)
    start_time = Time.current

    response = if block_given?
      chat.ask(message, &block)
    else
      chat.ask(message)
    end

    persist_to_history(message, response, start_time)
    response
  end

  private

  def chat
    @chat ||= build_chat
  end

  def build_chat
    tools = CORE_TOOLS.map { |klass| klass.new(requesting_user: @user) }

    SUB_AGENTS.each do |agent_class|
      if agent_class.available?(user: @user, metadata: @metadata)
        tools << agent_class.new(user: @user, metadata: @metadata)
      end
    end

    llm_chat = RubyLLM.chat(model: "claude-sonnet-4")
      .with_tools(*tools)
      .with_instructions(build_instructions)
      .with_thinking(effort: :medium)

    restore_conversation_context(llm_chat) if @conversation_id.present?
    llm_chat
  end

  def build_instructions
    "#{INSTRUCTIONS}\n\nCurrent user: #{@user.display_name}"
  end

  def restore_conversation_context(llm_chat)
    Bizy::ChatHistory
      .where(conversation_id: @conversation_id)
      .order(created_at: :asc)
      .each do |entry|
        llm_chat.add_message(role: :user, content: entry.user_message)
        llm_chat.add_message(role: :assistant, content: entry.bizy_response)
      end
  end

  def persist_to_history(message, response, start_time)
    Bizy::ChatHistory.create!(
      user_id: @user.id,
      company_id: @user.company_id,
      conversation_id: @conversation_id,
      context_type: @metadata[:context_type],
      context_id: @metadata[:context_id],
      user_message: message,
      bizy_response: response.content,
      response_time_ms: ((Time.current - start_time) * 1000).to_i,
      model_id: response.model_id,
      provider: response.provider_id,
      input_tokens: response.input_tokens,
      output_tokens: response.output_tokens,
      thinking_text: response.thinking&.text,
      tool_calls: format_tool_calls(response.tool_calls)
    )
  end

  def format_tool_calls(tool_calls)
    return [] if tool_calls.blank?
    tool_calls.map { |tc| { name: tc.name, arguments: tc.arguments, result: tc.result } }
  end
end

Key points:

  • execute is the single entry point - takes user, message, metadata, returns response
  • Instructions are minimal - tool descriptions handle routing
  • Conversation context is restored for multi-turn conversations
  • History is persisted after each response
  • Streaming is supported via block parameter

Extended Thinking for Complex Questions

The main Bizy chat handles diverse questions and often needs to invoke multiple tools or agents, then synthesize the results. RubyLLM's Extended Thinking gives reasoning models more time to deliberate, improving accuracy on these multi-step tasks.

Bizy::Chat uses extended thinking by default:

# app/lib/bizy/chat.rb (in build_chat method)
RubyLLM.chat(model: "claude-sonnet-4")
  .with_tools(*tools)
  .with_instructions(build_instructions)
  .with_thinking(effort: :medium)  # Enable for the main chat loop

Sub-agents configure their own thinking level based on their domain complexity:

# app/lib/bizy/agents/analytics.rb
class Bizy::Agents::Analytics < Bizy::Agents::Base
  def execute(question:)
    RubyLLM.chat(model: "claude-sonnet-4")
      .with_tools(*tools)
      .with_instructions(instructions)
      .with_thinking(effort: :high, budget: 10_000)  # Analytics needs deeper reasoning
      .ask(question).content
  end
end

Accessing thinking output (useful for debugging/logging):

response = chat.ask("Compare participation trends across departments")
response.thinking&.text    # The reasoning trace (if available)
response.content           # The final answer

Extended thinking adds latency but improves accuracy. The main chat uses :medium effort; specialized agents like Analytics can use :high for complex data synthesis.

Streaming with ActionCable

ActionCable provides WebSocket support in Rails, allowing us to push tokens to the frontend as they're generated.

# app/channels/bizy_channel.rb
class BizyChannel < ApplicationCable::Channel
  def subscribed
    stream_from "bizy:user:#{current_user.id}"
  end

  def self.stream_to(user, event:, data:)
    ActionCable.server.broadcast(
      "bizy:user:#{user.id}",
      { event: event, data: data }
    )
  end
end
# app/jobs/bizy/chat_job.rb
class Bizy::ChatJob < LLMJob
  def perform(user_id:, message:, conversation_id:, metadata: {})
    user = User.find(user_id)

    Bizy::Chat.execute(
      user: user,
      message: message,
      metadata: metadata,
      conversation_id: conversation_id
    ) do |chunk|
      BizyChannel.stream_to(user, event: "token", data: { text: chunk.content })
    end

    BizyChannel.stream_to(user, event: "complete", data: {})
  end
end

Background Job Processing

LLM calls take 10-60+ seconds. We use Async::Job (fiber-based) instead of Sidekiq (thread-based) to handle many concurrent requests efficiently. Fibers allow thousands of concurrent LLM calls to share a few connections, rather than blocking a thread per request.

# Base class for LLM jobs
class LLMJob < ApplicationJob
  self.queue_adapter = :async_job  # Uses fibers, not threads
end

# Regular jobs still use Sidekiq
class ImageProcessingJob < ApplicationJob
  # Uses default :sidekiq adapter
end

Graceful Interruption Handling

Deploys and dyno restarts (common on Heroku) can interrupt in-flight LLM requests. Rather than complex job recovery mechanisms, we handle this gracefully in the frontend.

The scenario:

  1. User asks question, streaming starts
  2. Deploy happens, worker process receives SIGTERM
  3. ActionCable connection drops mid-stream
  4. Frontend detects disconnect, shows friendly message
  5. User resubmits their question

Frontend implementation:

// src/modules/bizy/hooks/use-bizy-channel.ts
const useBizyChannel = () => {
  const [isStreaming, setIsStreaming] = useState(false);
  const [error, setError] = useState<string | null>(null);

  useEffect(() => {
    const subscription = cable.subscriptions.create("BizyChannel", {
      received(data) {
        if (data.event === "token") {
          appendToken(data.text);
        } else if (data.event === "complete") {
          setIsStreaming(false);
        } else if (data.event === "error") {
          setError(data.message);
          setIsStreaming(false);
        }
      },

      disconnected() {
        if (isStreaming) {
          // Connection dropped while waiting for response
          setError("Connection interrupted. Please try your question again.");
          setIsStreaming(false);
        }
      },

      rejected() {
        setError("Unable to connect. Please refresh and try again.");
      }
    });

    return () => subscription.unsubscribe();
  }, []);

  return { isStreaming, error, clearError: () => setError(null) };
};

Why this approach:

  • Simple: No job persistence, idempotency keys, or recovery logic
  • User-friendly: Clear message, user has full context to retry
  • Rare occurrence: Deploys are infrequent relative to chat volume
  • No partial state: Incomplete responses aren't saved to history

Tradeoffs:

  • User must manually retry (minor inconvenience)
  • Partial responses are lost (but they were incomplete anyway)

Chat History Persistence

RubyLLM provides acts_as_chat for ActiveRecord-based persistence, but Bonusly uses Mongoid. We continue using our existing Bizy::ChatHistory model and enhance it with RubyLLM-specific fields.

Why not use RubyLLM's acts_as_chat:

Aspect RubyLLM acts_as_chat Bizy::ChatHistory (enhanced)
ORM ActiveRecord only Mongoid (our stack)
Setup Generator creates migrations Already exists
Bonusly fields Would need to add Already has context_type, feedback, etc.
Indexes Would need to recreate Already optimized for our queries
Tool calls Separate ToolCall model Embedded array (simpler)

Enhanced ChatHistory model:

# app/models/bizy/chat_history.rb
class Bizy::ChatHistory
  include ApplicationDocument

  # Existing fields
  field :user_id, type: BSON::ObjectId
  field :company_id, type: BSON::ObjectId
  field :conversation_id, type: String
  field :context_type, type: String
  field :context_id, type: BSON::ObjectId
  field :user_message, type: String
  field :bizy_response, type: String
  field :response_time_ms, type: Integer
  field :emotion, type: String

  # Feedback fields (existing)
  field :feedback_type, type: String
  field :feedback_category, type: String
  field :feedback_details, type: String

  # NEW: RubyLLM-specific fields
  field :model_id, type: String           # e.g., "claude-sonnet-4", "gpt-4o"
  field :provider, type: String           # e.g., "anthropic", "openai"
  field :input_tokens, type: Integer
  field :output_tokens, type: Integer
  field :thinking_text, type: String      # Extended thinking trace
  field :tool_calls, type: Array          # [{name: "GetOrgContext", arguments: {...}, result: {...}}]

  # REMOVE: OpenAI-specific field (no longer needed)
  # field :ai_response_id, type: String   # RubyLLM handles conversation state
end

Persistence is handled by Bizy::Chat

Multi-turn conversations:

Bizy::Chat automatically restores conversation context when a conversation_id is provided. See the restore_conversation_context method in the Bizy::Chat class above.

Pros of this approach:

  • Works with our Mongoid stack (no ActiveRecord required)
  • Preserves existing Bonusly-specific fields and indexes
  • Simpler embedded tool_calls (no separate model)
  • Feedback loop already implemented

Cons:

  • Manual persistence (RubyLLM doesn't auto-save)
  • Must manually rebuild conversation context for multi-turn
  • Won't get future RubyLLM persistence features automatically

5. Context Handling

How Context Flows

The frontend sends context metadata with each message:

# Controller receives
{
  message: "What did Jane and I discuss?",
  context_metadata: {
    meeting_partner_id: "abc123",  # Enables MeetingAgent
    context_type: "meeting"         # For logging
  }
}

Bizy::Chat uses this to determine which sub-agents are available:

# In Bizy::Chat#build_chat
SUB_AGENTS.each do |agent_class|
  if agent_class.available?(user: @user, metadata: @metadata)
    tools << agent_class.new(user: @user, metadata: @metadata)
  end
end

Context → Agent Mapping

Context Available Agents Why
No metadata Core tools only General questions
meeting_partner_id present + MeetingAgent 1:1 meeting questions
User is admin + AnalyticsAgent Company analytics
Both + MeetingAgent + AnalyticsAgent Full access

6. Implementation Plan

Approach: Build in Parallel, Then Cut Over

We build the new framework entirely in parallel to existing Bizy. The old system continues to work unchanged while we prove out the new approach. Once validated, we cut over via feature flag.

┌─────────────────────────────────────────────────────────────┐
│ Development Timeline                                         │
├─────────────────────────────────────────────────────────────┤
│                                                              │
│  Existing Bizy (unchanged)                                   │
│  ════════════════════════════════════════════════►           │
│                                                              │
│  New Framework (parallel build)                              │
│  ─────────────────────────────────► [Validate] ─► [Cutover]  │
│                                                              │
│  Phases 1-5: Build new system    Phase 6: Feature flag test  │
│  (old system untouched)          Phase 7: Remove old code    │
│                                                              │
└─────────────────────────────────────────────────────────────┘

Why parallel development:

  • Zero risk to production during development
  • Can compare outputs between old and new systems
  • Easy rollback (just disable feature flag)
  • No partial migrations or hybrid states

Phase 1: Infrastructure

Goal: Set up RubyLLM, ActionCable, and async jobs.

# Gemfile
gem 'ruby_llm'
gem 'async-job-adapter-active_job'
  • Configure RubyLLM with API keys
  • Create BizyChannel
  • Create LLMJob base class
  • Verify: RubyLLM.chat.ask("Hello") works in console

Existing Bizy: Unchanged

Phase 2: First Tool

Goal: Create one new tool to prove the pattern.

Create new GetOrgContext in the new location (does not modify existing tool):

# NEW: app/lib/bizy/tools/get_org_context.rb (RubyLLM-based)
# OLD: lib/bizy/tools/get_org_context.rb (still works, untouched)

Test in console:

tool = Bizy::Tools::GetOrgContext.new(requesting_user: user)
RubyLLM.chat.with_tools(tool).ask("Who is my manager?")

Existing Bizy: Unchanged

Phase 3: New Endpoint

Goal: Create new streaming endpoint (separate from existing).

# config/routes.rb
post 'bizy/chat', to: 'chat#create'                    # Existing (unchanged)
post 'bizy/chat/v2', to: 'chat#create_v2'              # New (feature flagged)

The new endpoint enqueues a job and returns immediately. Feature flag controls access.

Existing Bizy: Unchanged

Phase 4: Agent Structure

Goal: Create agent base class and top-level chat agent.

  • Create Bizy::Agents::Base (base class for sub-agents)
  • Create Bizy::Chat (top-level agent with execute method)
  • Register core tools and sub-agents in Bizy::Chat

Existing Bizy: Unchanged

Phase 5: Complete New System

Goal: Build out all tools and agents for the new framework.

Core tools (new implementations in app/lib/bizy/tools/):

  • GetOrgContext, GetRecognition, GetMilestones, GetHashtags

Agents (new in app/lib/bizy/agents/):

  • MeetingAgent (uses GetMeetingHistory, GetMeetingSummary, GetCheckinHistory)
  • AnalyticsAgent (uses GetParticipationTrends, GetRecognitionStats)

At this point, the new system is fully functional but behind a feature flag.

Existing Bizy: Unchanged, still serving all traffic

Phase 6: Validate + Cut Over

Goal: Prove the new system works, then switch traffic.

  1. Internal testing: Team uses new system via feature flag
  2. Shadow mode (optional): Run both systems, compare outputs
  3. Gradual rollout: Enable for percentage of users via feature flag
  4. Full cutover: Enable for all users
// Frontend feature flag
const BizyChat = () => {
  const useNewFramework = useFeatureFlag('bizy-v2');

  if (useNewFramework) {
    return <StreamingBizyChat />;  // New ActionCable-based
  }
  return <LegacyBizyChat />;       // Existing sync
};

Phase 7: Remove Old Code

Goal: Clean up after successful cutover.

Only after the new system is proven in production:

  • Remove lib/bizy/ai_driver.rb
  • Remove lib/bizy/base_tool.rb
  • Remove lib/bizy/tools/*.rb (old MCP tools)
  • Remove MCP controller (if not used elsewhere)
  • Remove old routes and feature flag checks

Rollback Plan

At every phase, existing Bizy continues to work. Rollback is simple:

Phase Rollback
1-5 Nothing to rollback (old system still serving traffic)
6 Disable feature flag → 100% traffic to old system
7 Restore deleted code from git (should rarely be needed)

Success Criteria

  • New system fully functional behind feature flag
  • Internal team validated new system behavior
  • Streaming responses working end-to-end
  • Performance equal or better than old system
  • Feature flag rollout completed without issues
  • Old code removed
  • Async::Job handling LLM calls
  • Feature flag shows no regressions
  • Old MCP infrastructure removed

7. Advanced Topics

MCP Integration

RubyLLM can both consume MCP tools (as a client) and our agents can be exposed via MCP (as a server). This provides flexibility for integration with external systems.

Consuming MCP Tools (Client)

The ruby_llm-mcp gem provides full MCP client support for RubyLLM. This is useful if we want RubyLLM agents to call tools hosted on existing MCP servers (like our current Bizy/BonuslyGPT infrastructure during migration).

require 'ruby_llm/mcp'

mcp_client = RubyLLM::MCP.client(
  name: "bizy-mcp",
  transport_type: :streamable,
  config: { url: ENV['BIZY_MCP_URL'] }
)

chat = RubyLLM.chat.with_tools(*mcp_client.tools)

Use cases:

  • Gradual migration: New agents use RubyLLM but can still call existing MCP tools
  • External MCP servers: Connect to third-party MCP servers (filesystem, databases, etc.)
  • Hybrid architecture: Mix local tools with remote MCP tools

Exposing Agents via MCP (Server)

If we want external systems (other AI platforms, ChatGPT plugins, etc.) to call our agents, we can expose them as an MCP server. Our existing Mcp::BizyController pattern already does this.

Example pattern (could be extended for new agents):

# app/controllers/mcp/agents_controller.rb
module Mcp
  class AgentsController < ApplicationController
    # JSON-RPC 2.0 endpoint for MCP
    def handle
      case json_rpc_method
      when "tools/list"
        render_tools_list
      when "tools/call"
        result = execute_tool(params[:name], params[:arguments])
        render json: { result: result }
      end
    end

    private

    def render_tools_list
      render json: {
        tools: [
          {
            name: "ask_bizy",
            description: "Ask Bizy a question about Bonusly",
            inputSchema: {
              type: "object",
              properties: {
                question: { type: "string", description: "The question to ask" }
              },
              required: ["question"]
            }
          }
        ]
      }
    end

    def execute_tool(name, arguments)
      case name
      when "ask_bizy"
        response = Bizy::Chat.execute(
          user: current_user,
          message: arguments[:question],
          metadata: {}
        )
        response.content
      end
    end
  end
end

Use cases:

  • ChatGPT/Claude plugins: Let external AI access Bonusly data
  • Enterprise integrations: Partner systems can query our agents
  • A2A protocol: If we adopt A2A, MCP servers are a building block

When to Use Each Approach

Scenario Approach
New agent with new tools Native RubyLLM tools (simplest)
New agent + existing MCP tools ruby_llm-mcp client
Expose agent to external systems MCP server endpoint
Gradual migration from MCP Hybrid (ruby_llm-mcp + native)

Future: SSE for Non-Puma Environments

ActionCable is the right choice for Puma. If we later adopt an async-native server like Falcon, SSE becomes viable for streaming without blocking threads.

                    ┌─────────────┐
                    │   Falcon    │
                    │  (async)    │
                    └──────┬──────┘
                           │
                    ┌──────▼──────┐
                    │  SSE Stream │
                    └──────┬──────┘
                           │
                    ┌──────▼──────┐
                    │   Client    │
                    └─────────────┘

Falcon uses Ruby's fiber-based async model, allowing thousands of concurrent connections on a single thread. This pairs naturally with RubyLLM's streaming and async capabilities.

See RubyLLM Async documentation for details on fiber-based concurrency and Falcon integration.

Nested Agents

Agents can use other agents as tools:

class Bizy::Agents::ExecutiveReport < Bizy::Agents::Base
  description "Generate executive reports"

  def tools
    [
      Bizy::Agents::Analytics.new(user: @user, metadata: @metadata),
      Bizy::Tools::GetTopPerformers.new(company: @user.company),
    ]
  end
end

8. FAQ

Why go back to function calling from MCP?

Why MCP exists: MCP evolved as a standard because every LLM provider (OpenAI, Anthropic, Google, etc.) had their own custom mechanism for function calling. MCP provides a way to "build once, use anywhere" - define tools once and expose them to any AI system, regardless of provider.

Why we don't need it for internal tools: RubyLLM handles the first part for us. It provides a unified abstraction over all providers' function calling mechanisms. We write a tool once using RubyLLM's DSL, and it works with OpenAI, Anthropic, Gemini, or any other supported provider.

# This tool works with ANY provider - RubyLLM handles the translation
class Bizy::Tools::GetOrgContext < RubyLLM::Tool
  description "Get organizational context for a user"
  param :target_user, desc: "User ID, email, or name"

  def execute(target_user:)
    # Same code works whether we're using Claude, GPT, or Gemini
  end
end

What about exposing tools externally? We don't have a need to expose these tools via MCP yet. If we do in the future (for ChatGPT plugins, partner integrations, etc.), it's easy to add on later - see "Exposing Agents via MCP" in Advanced Topics.

What we gain by removing MCP internally:

  • No network round-trip for every tool call (less latency)
  • No MCP server process to maintain
  • Tools are just Ruby classes - easy to test and debug
  • Simpler architecture overall

Where does orchestration happen?

Orchestration happens in three places:

  1. Bizy::Chat (setup-time): Assembles available tools and sub-agents based on user permissions and context metadata.
# Bizy::Chat#build_chat decides WHAT tools/agents are available
def build_chat
  tools = CORE_TOOLS.map { |klass| klass.new(requesting_user: @user) }

  SUB_AGENTS.each do |agent_class|
    if agent_class.available?(user: @user, metadata: @metadata)
      tools << agent_class.new(user: @user, metadata: @metadata)
    end
  end

  RubyLLM.chat(model: "claude-sonnet-4").with_tools(*tools)
end
  1. The LLM (runtime): Decides WHICH tools to call based on the user's question and tool descriptions. This is the "orchestrator" - it reads the question, looks at available tools, and decides what to invoke.
  2. Sub-agents (nested runtime): When the LLM invokes an agent-as-tool, that agent runs its own chat with its own tools, creating a nested orchestration layer.
User Question
     ↓
Bizy::Chat.execute (assembles tools/agents)
     ↓
Main LLM (decides: "this is a meeting question")
     ↓
MeetingAgent.execute(question:)
     ↓
Sub-LLM (uses meeting-specific tools)
     ↓
Response bubbles back up

How does the chat agent know which sub-agent to invoke?

The LLM decides based on two sources of guidance:

1. Tool/Agent Descriptions

Each agent has a description that tells the LLM when to use it:

class Bizy::Agents::Meeting < Bizy::Agents::Base
  description "Ask questions about 1:1 meetings, past discussions, and meeting prep"
  # ...
end

class Bizy::Agents::Analytics < Bizy::Agents::Base
  description "Get company analytics, participation trends, and recognition statistics"
  # ...
end

When you ask "What did Jane and I discuss last week?", the LLM sees:

  • GetOrgContext: "Get organizational context for a user" - not relevant
  • MeetingAgent: "Ask questions about 1:1 meetings, past discussions..." - matches!
  • AnalyticsAgent: "Get company analytics, participation trends..." - not relevant

The LLM then calls MeetingAgent.execute(question: "What did Jane and I discuss last week?").

2. Main Chat System Prompt

The Bizy::Chat system prompt provides general guidance without duplicating tool descriptions:

class Bizy::Chat
  INSTRUCTIONS = <<~PROMPT
    You are Bizy, a helpful AI assistant for Bonusly.

    You have access to tools for looking up information and specialized agents
    for specific domains. When a question matches a specialized agent's domain,
    prefer delegating to that agent rather than answering directly.

    Be helpful, concise, and accurate. If you don't have enough information
    to answer a question, say so.
  PROMPT
end

The system prompt is intentionally minimal - it sets persona and general behavior, while tool descriptions do the heavy lifting for routing decisions. This avoids duplication and keeps things maintainable.

Key points:

  • Tool descriptions are the primary mechanism for routing decisions
  • System prompt provides persona and general preferences
  • No need to duplicate agent info in the system prompt
  • The LLM may call multiple tools/agents for complex questions
  • If no agent matches, core tools handle it directly

Writing effective descriptions:

Per Anthropic's best practices, aim for at least 3-4 sentences per tool description. Include:

  • What the tool does
  • When it should be used (and when it shouldn't)
  • What each parameter means
  • Any important caveats or limitations

Including examples in descriptions can improve routing accuracy:

class Bizy::Agents::Meeting < Bizy::Agents::Base
  description <<~DESC
    Ask questions about 1:1 meetings and past discussions with a specific person.

    Use this agent when the user asks about:
    - What was discussed in previous meetings ("What did we talk about last week?")
    - Meeting history and summaries ("Give me a recap of my meetings with Jane")
    - Preparing for upcoming 1:1s ("What should I discuss with my manager?")

    Do NOT use for general questions about the user's calendar or scheduling.
  DESC
end

Token considerations: Tool descriptions count toward input tokens. A detailed description (~100-200 tokens) is a small cost for better routing accuracy. There's no hard limit, but be mindful if you have many tools.

Why expose agents as tools? Isn't that conflating concepts?

It might seem confusing at first - aren't agents and tools different things? Here's why the pattern works:

From the LLM's perspective, there's no difference. When the main Bizy chat sees its available "tools," it doesn't know (or care) whether GetOrgContext fetches data from a database or whether MeetingAgent runs an entire sub-conversation. Both are just: "call this with these parameters, get a result back."

The key insight: An agent is just a tool that happens to use an LLM internally.

Regular tool:        input → Ruby code → output
Agent-as-tool:       input → Ruby code → LLM + more tools → output

Why this is better than alternatives:

Alternative Problem
Separate routing layer Extra code to maintain, another place for bugs, duplicates what LLMs do well (understanding intent)
All tools flat Main chat needs to know about every tool in the system, bloated context, harder to maintain
Explicit agent handoff Requires the LLM to understand a special "handoff" concept, more complex prompting

The agent-as-tool pattern:

  • Uses the LLM's natural tool-calling ability for routing
  • Encapsulates complexity (the main chat doesn't know MeetingAgent uses 3 sub-tools)
  • Composes naturally (agents can use other agents)
  • Follows RubyLLM's recommended pattern (see agentic workflows)

Conceptual clarity: Think of it like delegation in an organization. When you ask your assistant a question, they might answer directly (tool) or say "let me check with the finance team and get back to you" (agent-as-tool). Either way, you just asked a question and got an answer.

What other tools did we consider besides RubyLLM?

We evaluated several Ruby AI frameworks before choosing RubyLLM:

LangChain.rb (https://github.com/patterns-ai-core/langchainrb)

  • Ruby port of Python LangChain. 2k+ GitHub stars, 15+ providers, production-ready.
  • Why not: Heavier abstraction, more "batteries included" than we need. Python-first patterns translated to Ruby.

Raix (https://github.com/OlympiaAI/raix-rails)

  • Ruby AI eXtensions for Rails by OlympiaAI.
  • Why not: Smaller community (44 stars), less mature tooling ecosystem.

BoxCars (https://www.boxcars.ai/)

  • Rails-focused AI gem for text-to-action features.
  • Why not: More opinionated about workflow patterns, less flexibility for our agent-as-tool approach.

OmniAI (https://rubygems.org/gems/omniai)

  • Unified interface for multiple AI providers.
  • Why not: Simpler scope - primarily a provider abstraction, less support for agentic patterns.

Sublayer (https://docs.sublayer.com/)

  • Model-agnostic AI agent framework with DSL.
  • Why not: Different architectural approach (Generators/Actions), less Rails integration.

Chatwoot AI Agents (https://github.com/chatwoot/ai-agents)

  • Ruby AI Agents SDK inspired by OpenAI's Agents SDK. Built on top of RubyLLM. Features multi-agent orchestration, seamless handoffs, and shared context.
  • Why not: Actually a strong contender! However, it's a layer on top of RubyLLM. Using RubyLLM directly gives us more control and fewer dependencies. Worth revisiting if we need more complex handoff patterns.

Active Agent (https://www.activeagents.ai/)

  • Rails-native AI framework following MVC conventions. Treats agents like controllers, prompts like views. Includes Action Prompt, Generation Provider, and Queued Generation modules. Integrates with background jobs and streaming.
  • Why not: Interesting "agents as controllers" approach, but different from our agent-as-tool pattern. Less mature ecosystem. Worth watching as it develops.

Direct SDKs (OpenAI/Anthropic Ruby gems)

  • Use provider SDKs directly without abstraction.
  • Why not: Provider lock-in, must implement tool calling ourselves.

Why RubyLLM won:

  1. Right level of abstraction: Not too heavy (LangChain) or too light (OmniAI/direct SDKs)
  2. Native Rails integration: Built-in generators, acts_as_chat (even if we use our own persistence)
  3. Agentic patterns documented: Clear guidance on multi-agent orchestration
  4. Active development: Regular releases, responsive maintainer, growing community
  5. Provider flexibility: Easy to swap models or use different providers for different tasks
  6. Async support: Built-in fiber-based concurrency for efficient LLM operations

Trade-off acknowledged: LangChain.rb has a larger community and more built-in integrations (vector stores, document loaders, etc.). If we needed those features heavily, it might be worth the heavier abstraction. For our use case (chat + tools + agents), RubyLLM is the better fit.


9. Reference

RubyLLM Documentation

Topic Link When to Read
Getting Started rubyllm.com First setup
Tool Definition rubyllm.com/tools Creating tools
Extended Thinking rubyllm.com/thinking Complex reasoning tasks
Streaming rubyllm.com/streaming Real-time responses
Async/Fibers rubyllm.com/async Background job setup
Rails Integration rubyllm.com/rails Persistence with acts_as_chat
Multi-Agent Patterns rubyllm.com/agentic-workflows Agent-as-tool pattern
Structured Output rubyllm.com/chat Schema-based responses

Related Libraries

Rails Guides

Future Considerations

Current Bizy Files

File Purpose
lib/bizy/ai_driver.rb Main orchestrator (to be replaced)
lib/bizy/base_tool.rb MCP tool base class
lib/bizy/tools/*.rb Current tool implementations
app/controllers/mcp/bizy_controller.rb MCP server endpoint

Glossary

Term Definition
Tool A function the AI can call to get information or perform actions
Agent A specialized AI assistant with its own tools, persona, and instructions
Agent-as-Tool Pattern where specialized agents are invoked as tools by a coordinator
Extended Thinking Giving reasoning models more time/budget to deliberate on complex tasks
MCP Model Context Protocol - JSON-RPC standard for AI tool execution
ActionCable Rails WebSocket framework for real-time bidirectional communication
RubyLLM Provider-agnostic Ruby library for LLM interactions
Async::Job Fiber-based job processor optimized for I/O-bound work like LLM calls
Fiber Lightweight concurrency primitive; many can run on a single thread
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment