| name | overview | todos | isProject | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Composable Agent Framework |
Design a composable chat-based agent framework using RubyLLM, with ActionCable streaming, that can be surfaced across multiple product entry points while maintaining traceability. |
|
false |
A chat-based AI assistant framework that:
- Works across multiple product surfaces (meetings, admin, general chat)
- Routes questions to specialized "agents" based on context and permissions
- Streams responses in real-time without blocking web server threads
- Is built on proven patterns using the RubyLLM library
| Goal | Why It Matters |
|---|---|
| Multi-surface | One framework powers Bizy everywhere it appears |
| Composable | Add new capabilities by adding agents, not modifying core code |
| Streaming | Users see responses as they're generated |
| Non-blocking | Long LLM calls don't consume web server threads |
| Provider-agnostic | Use the best model for each task (Claude for analysis, GPT for conversation) |
| Aspect | Current | New |
|---|---|---|
| LLM Client | OpenAI Ruby SDK | RubyLLM (any provider) |
| Tool Execution | MCP server (HTTP calls) | Direct function calling |
| Response Delivery | Synchronous JSON | ActionCable streaming |
| Thread Model | Blocks Puma thread | Async background jobs |
| Extensibility | Modify AiDriver | Add new agent class |
A tool is a function the AI can call to get information or perform actions. When you ask "Who is my manager?", the AI calls a tool to look that up.
class Bizy::Tools::GetOrgContext < RubyLLM::Tool
description "Get organizational context for a user"
param :target_user, desc: "User ID, email, or name"
def execute(target_user:)
# Called by the AI when it needs org info
target = resolve_user(target_user)
{
user: format_user(target),
manager: format_manager(target),
direct_reports: format_direct_reports(target)
}
end
endThe AI sees the description and param info, decides when to use the tool, and receives the return value.
An agent is a specialized AI assistant with its own tools and personality. Think of it as a department expert you can delegate questions to.
In our framework, agents ARE tools. The main Bizy chat can invoke a specialized agent just like any other tool:
class Bizy::Agents::Meeting < Bizy::Agents::Base
description "Ask questions about 1:1 meetings and past discussions"
def tools
[GetMeetingHistory.new(...), GetMeetingSummary.new(...)]
end
def instructions
"You are a meeting assistant. Help with 1:1 prep and history."
end
endInstead of one chat with many tools, we have a coordinator chat with a few core tools plus specialized agents:
┌─────────────────────────────────────────────────────────────┐
│ Main Bizy Chat │
│ Core tools: GetOrgContext, GetRecognition, GetMilestones │
│ Agents: MeetingAgent, AnalyticsAgent │
│ │
│ User: "What did Jane and I discuss last week?" │
│ ↓ │
│ Bizy decides: This is a meeting question → invoke agent │
│ ↓ │
│ ┌─────────────────────────────────────────┐ │
│ │ MeetingAgent │ │
│ │ Tools: GetMeetingHistory, GetSummary │ │
│ │ Runs its own AI chat │ │
│ │ Returns: "In your Jan 15 meeting..." │ │
│ └─────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────┘
Why this is better:
- The main chat doesn't need to know about every tool in the system
- Each agent encapsulates its domain completely
- Adding a new capability = adding a new agent class
- Agents can be nested (an executive report agent could use the analytics agent)
Before choosing an architecture, it's worth understanding the landscape of agent orchestration patterns:
| Pattern | What It Is | Best For |
|---|---|---|
| Function Calling | Tools defined in API request, app executes them | Simple integrations, 2-5 tools |
| MCP | Tools on separate server, discovered/executed via JSON-RPC | Shared tooling, security isolation |
| A2A | Open standard for inter-agent communication across organizations | Enterprise ecosystems, multi-vendor |
| ACP | Agent Communication Protocol (now merged into A2A) | Deprecated - use A2A |
| Agent-as-Tool | Specialized agents invoked as tools by a coordinator | Multi-agent orchestration within one system |
User → App → LLM API (with tool definitions) → App executes tool → LLM API → Response
- Pros: Simple, full control, no network hops for tool discovery
- Cons: Provider-specific APIs when using raw SDKs (mitigated by RubyLLM)
- Used by: RubyLLM's native tool pattern
Note: RubyLLM eliminates provider lock-in by providing a unified API. Tools written once work across OpenAI, Anthropic, Gemini, Mistral, and others.
User → App → LLM API → MCP Server (JSON-RPC) → Tool execution → LLM API → Response
- Pros: Security isolation, shared tooling, credential separation
- Cons: Extra network hop, more infrastructure
- Used by: Current Bizy and BonuslyGPT implementations
Agent A → HTTP/JSON-RPC → Agent B (with Agent Card discovery)
- Pros: Cross-organization interop, standardized discovery
- Cons: Overkill for internal agents, added complexity
- Best for: Enterprise agent marketplaces, not internal frameworks
Coordinator Agent → Specialized Agent (via tool call) → Result → Coordinator continues
- Pros: Composable, each agent focused, natural tool semantics
- Cons: Deeper call stacks, potential for loops
- Used by: RubyLLM's multi-agent patterns
Agent-as-Tool with direct function calling (via RubyLLM) is the right fit because:
- All agents are internal to Bonusly - no need for A2A's cross-org interop
- RubyLLM handles tool execution natively without MCP's network overhead
- Specialized agents can be tools that the coordinator invokes
- Maintains simplicity while enabling composition
| System | Purpose | How It Works |
|---|---|---|
| Bizy | User assistant for meetings/org | MCP tools via OpenAI Responses API |
| KYB | Compliance verification | Direct API calls, structured output |
| BonuslyGPT | Internal support | MCP tools + vector search |
User → ChatController → AiDriver → OpenAI API → MCP Server → Tools
↓
Response (sync)
Issues with current approach:
- Blocks Puma threads: Each chat request holds a thread for 10-60 seconds
- MCP overhead: Tool calls require HTTP round-trips to MCP server
- Single agent: All logic in AiDriver, hard to extend
- OpenAI lock-in: Tied to OpenAI's specific API
lib/bizy/ai_driver.rb- Main orchestrator (538 lines)lib/bizy/base_tool.rb- Tool base class for MCPlib/bizy/tools/*.rb- Individual tools (7 tools)app/controllers/mcp/bizy_controller.rb- MCP server endpoint
┌─────────────────────────────────────────────────────────────┐
│ New Architecture │
├─────────────────────────────────────────────────────────────┤
│ │
│ Frontend (Kaleidoscope) │
│ │ │
│ │ 1. POST /api/v2/bizy/chat │
│ │ 2. Subscribe to BizyChannel │
│ ▼ │
│ ChatController │
│ │ │
│ │ 3. Enqueue Bizy::ChatJob │
│ │ (Puma thread freed immediately) │
│ ▼ │
│ Bizy::ChatJob (Async::Job) │
│ │ │
│ │ 4. Bizy::Chat.execute(user:, message:, metadata:) │
│ ▼ │
│ RubyLLM Chat with Tools + Agents │
│ │ │
│ │ 5. Stream tokens via ActionCable │
│ ▼ │
│ BizyChannel → Frontend │
│ │
└─────────────────────────────────────────────────────────────┘
Before adopting RubyLLM, we should consider whether to continue building on our current custom implementation or adopt a library.
Current Bizy Implementation:
- Custom OpenAI Responses API integration
- Custom MCP implementation for tool calling
- Custom streaming (SSE-based, has Puma threading issues)
- Custom conversation management
Comparison:
| Aspect | Build Our Own | Use RubyLLM |
|---|---|---|
| Development time | High - rebuild provider APIs, tool calling, streaming | Low - already built and tested |
| Provider support | Must implement each (OpenAI, Anthropic, etc.) | 15+ providers out of the box |
| Tool calling | Must handle JSON schema, validation, execution | Native DSL, automatic schema generation |
| Streaming | Must handle SSE/WebSocket per provider | Consistent block-based API |
| Maintenance | Team must track API changes for each provider | Community maintains compatibility |
| Testing | Must build test infrastructure | Built-in test helpers and mocking |
| Extended thinking | Must implement per-provider | Unified API across Claude/Gemini |
| Async/concurrency | Must build fiber/thread management | Built-in Async::Job integration |
| Flexibility | Full control over implementation | Constrained by library design |
| Lock-in risk | None | Tied to RubyLLM's abstraction |
Pros of RubyLLM:
- Proven: Active community, battle-tested in production
- Speed: Skip months of infrastructure work, focus on Bonusly-specific features
- Multi-model: Use Claude for complex reasoning, GPT for quick responses, Gemini for long context
- Future-proof: New models (GPT-5, Claude 4) supported by updating the gem
- Tool ecosystem: Can integrate with MCP servers via ruby_llm-mcp
- Well-documented: Extensive documentation with guides for common agentic patterns like multi-agent orchestration, tool composition, and streaming
Cons of RubyLLM:
- Abstraction leaks: Edge cases may require workarounds or PRs
- Dependency risk: Library could become unmaintained (mitigated: active development, MIT license)
- Learning curve: Team needs to learn RubyLLM patterns
- ActiveRecord assumptions: Some features (like
acts_as_chat) assume ActiveRecord; we use Mongoid
Recommendation: Adopt RubyLLM. The development time savings are significant, and the library's design aligns well with our needs. The Mongoid limitation for chat persistence is easily worked around with our existing Bizy::ChatHistory model (see "Chat History Persistence" section).
RubyLLM provides:
| Feature | Benefit |
|---|---|
| Provider Agnostic | Same API for OpenAI, Anthropic, Gemini, etc. Use the best model for each task. |
| Native Tools | First-class tool support without MCP overhead |
| Streaming | Built-in streaming with block syntax |
| Rails Integration | acts_as_chat for persistence (ActiveRecord) |
| Async Support | Fiber-based concurrency for parallel operations |
RubyLLM tools use a DSL for defining input parameters. RubyLLM v1.9+ provides a params block DSL for complex schemas (nested objects, arrays, enums), while simpler tools can use the param helper.
Simple tool with param helper:
# app/lib/bizy/tools/get_org_context.rb
class Bizy::Tools::GetOrgContext < RubyLLM::Tool
description "Get organizational context for a user"
param :target_user, desc: "User ID, email, or name"
def initialize(requesting_user:)
@requesting_user = requesting_user
end
def execute(target_user:)
target = resolve_user(target_user)
{
user: format_user(target),
manager: format_manager(target),
direct_reports: format_direct_reports(target),
relationship_to_you: build_relationship(target)
}
end
private
def resolve_user(identifier)
return @requesting_user if identifier.blank?
@requesting_user.company.users.active.find_by_identifier(identifier)
end
def format_user(user)
{ id: user.id.to_s, name: user.display_name, email: user.email }
end
endComplex tool with params DSL (v1.9+):
For tools with structured inputs, use the params block for nested objects, arrays, and enums:
class Bizy::Tools::GetParticipationTrends < RubyLLM::Tool
description "Get participation trends for the company"
params do
string :group_by, description: "Property to group by (department, location, team)"
integer :months, description: "Number of months to analyze", required: false
object :filters, description: "Optional filters to apply", required: false do
array :departments, of: :string, description: "Limit to specific departments"
enum :status, %w[active inactive all], description: "User status filter"
end
end
def initialize(company:)
@company = company
end
def execute(group_by:, months: 12, filters: nil)
input = Analytics::Queries::GetGivingAndReceivingParticipationData::Input.new(
company_id: @company.id,
end_time: Time.current,
custom_property_group: group_by
)
result = Analytics::Queries::GetGivingAndReceivingParticipationData.call(input)
format_for_llm(result)
end
endRubyLLM supports structured output via with_schema, ensuring the LLM returns valid JSON matching a defined schema. This is useful when agents need to return structured data rather than free-form text.
Defining output schemas with RubyLLM::Schema:
# app/lib/bizy/schemas/meeting_summary.rb
class Bizy::Schemas::MeetingSummary < RubyLLM::Schema
string :summary, description: "Brief summary of the meeting"
array :key_topics, of: :string, description: "Main topics discussed"
array :action_items, description: "Action items from the meeting" do
string :task, description: "The task to complete"
string :owner, description: "Person responsible"
string :due_date, description: "When it's due", required: false
end
string :next_steps, description: "Recommended next steps", required: false
endUsing schemas in agent responses:
class Bizy::Agents::Meeting < Bizy::Agents::Base
def execute(question:)
RubyLLM.chat(model: "claude-sonnet-4")
.with_tools(*tools)
.with_instructions(instructions)
.with_schema(Bizy::Schemas::MeetingSummary)
.ask(question)
end
endWhen to use schemas:
Use schemas whenever the caller needs well-structured output - the LLM will conform its response to the schema regardless of how the question is phrased.
- Analytics agents returning data for charts/tables
- Meeting agents returning summaries with action items
- Any agent whose output will be parsed or displayed programmatically
- Ensuring consistent response formats for frontend rendering
# app/lib/bizy/agents/base.rb
class Bizy::Agents::Base < RubyLLM::Tool
# Agents are tools that run their own sub-chat
param :question, desc: "The question to ask this specialized agent"
def initialize(user:, metadata: {})
@user = user
@metadata = metadata
end
def execute(question:)
chat = RubyLLM.chat(model: "gpt-4o")
.with_tools(*tools)
.with_instructions(instructions)
chat.ask(question).content
end
private
def tools
raise NotImplementedError
end
def instructions
raise NotImplementedError
end
end# app/lib/bizy/agents/meeting.rb
class Bizy::Agents::Meeting < Bizy::Agents::Base
description "Ask questions about 1:1 meetings and past discussions"
def self.available?(user:, metadata:)
metadata[:meeting_partner_id].present?
end
def initialize(user:, metadata:)
super
@partner_id = metadata[:meeting_partner_id]
@partner = User.find(@partner_id)
end
private
def tools
[
Bizy::Tools::GetMeetingHistory.new(
requesting_user: @user,
partner_id: @partner_id
),
Bizy::Tools::GetMeetingSummary.new(
requesting_user: @user,
partner_id: @partner_id
),
]
end
def instructions
<<~PROMPT
You are a meeting assistant for 1:1 meetings.
Context:
- Current user: #{@user.display_name}
- Meeting partner: #{@partner.display_name}
Use get_meeting_summary for recent meetings.
Use get_meeting_history for full history.
PROMPT
end
endBizy::Chat is the main entry point - a top-level agent that orchestrates tools and sub-agents. Unlike sub-agents (which extend RubyLLM::Tool), Bizy::Chat is invoked by application code, not by an LLM.
# app/lib/bizy/chat.rb
class Bizy::Chat
INSTRUCTIONS = <<~PROMPT
You are Bizy, a helpful AI assistant for Bonusly.
You have access to tools for looking up information and specialized agents
for specific domains. When a question matches a specialized agent's domain,
prefer delegating to that agent rather than answering directly.
Be helpful, concise, and accurate. If you don't have enough information
to answer a question, say so.
PROMPT
# Core tools (always available)
CORE_TOOLS = [
Bizy::Tools::GetOrgContext,
Bizy::Tools::GetRecognition,
Bizy::Tools::GetMilestones,
].freeze
# Sub-agents (conditionally available based on context)
SUB_AGENTS = [
Bizy::Agents::Meeting,
Bizy::Agents::Analytics,
].freeze
def self.execute(user:, message:, metadata: {}, conversation_id: nil, &block)
new(user: user, metadata: metadata, conversation_id: conversation_id)
.execute(message, &block)
end
def initialize(user:, metadata: {}, conversation_id: nil)
@user = user
@metadata = metadata
@conversation_id = conversation_id
end
def execute(message, &block)
start_time = Time.current
response = if block_given?
chat.ask(message, &block)
else
chat.ask(message)
end
persist_to_history(message, response, start_time)
response
end
private
def chat
@chat ||= build_chat
end
def build_chat
tools = CORE_TOOLS.map { |klass| klass.new(requesting_user: @user) }
SUB_AGENTS.each do |agent_class|
if agent_class.available?(user: @user, metadata: @metadata)
tools << agent_class.new(user: @user, metadata: @metadata)
end
end
llm_chat = RubyLLM.chat(model: "claude-sonnet-4")
.with_tools(*tools)
.with_instructions(build_instructions)
.with_thinking(effort: :medium)
restore_conversation_context(llm_chat) if @conversation_id.present?
llm_chat
end
def build_instructions
"#{INSTRUCTIONS}\n\nCurrent user: #{@user.display_name}"
end
def restore_conversation_context(llm_chat)
Bizy::ChatHistory
.where(conversation_id: @conversation_id)
.order(created_at: :asc)
.each do |entry|
llm_chat.add_message(role: :user, content: entry.user_message)
llm_chat.add_message(role: :assistant, content: entry.bizy_response)
end
end
def persist_to_history(message, response, start_time)
Bizy::ChatHistory.create!(
user_id: @user.id,
company_id: @user.company_id,
conversation_id: @conversation_id,
context_type: @metadata[:context_type],
context_id: @metadata[:context_id],
user_message: message,
bizy_response: response.content,
response_time_ms: ((Time.current - start_time) * 1000).to_i,
model_id: response.model_id,
provider: response.provider_id,
input_tokens: response.input_tokens,
output_tokens: response.output_tokens,
thinking_text: response.thinking&.text,
tool_calls: format_tool_calls(response.tool_calls)
)
end
def format_tool_calls(tool_calls)
return [] if tool_calls.blank?
tool_calls.map { |tc| { name: tc.name, arguments: tc.arguments, result: tc.result } }
end
endKey points:
executeis the single entry point - takes user, message, metadata, returns response- Instructions are minimal - tool descriptions handle routing
- Conversation context is restored for multi-turn conversations
- History is persisted after each response
- Streaming is supported via block parameter
The main Bizy chat handles diverse questions and often needs to invoke multiple tools or agents, then synthesize the results. RubyLLM's Extended Thinking gives reasoning models more time to deliberate, improving accuracy on these multi-step tasks.
Bizy::Chat uses extended thinking by default:
# app/lib/bizy/chat.rb (in build_chat method)
RubyLLM.chat(model: "claude-sonnet-4")
.with_tools(*tools)
.with_instructions(build_instructions)
.with_thinking(effort: :medium) # Enable for the main chat loopSub-agents configure their own thinking level based on their domain complexity:
# app/lib/bizy/agents/analytics.rb
class Bizy::Agents::Analytics < Bizy::Agents::Base
def execute(question:)
RubyLLM.chat(model: "claude-sonnet-4")
.with_tools(*tools)
.with_instructions(instructions)
.with_thinking(effort: :high, budget: 10_000) # Analytics needs deeper reasoning
.ask(question).content
end
endAccessing thinking output (useful for debugging/logging):
response = chat.ask("Compare participation trends across departments")
response.thinking&.text # The reasoning trace (if available)
response.content # The final answerExtended thinking adds latency but improves accuracy. The main chat uses :medium effort; specialized agents like Analytics can use :high for complex data synthesis.
ActionCable provides WebSocket support in Rails, allowing us to push tokens to the frontend as they're generated.
# app/channels/bizy_channel.rb
class BizyChannel < ApplicationCable::Channel
def subscribed
stream_from "bizy:user:#{current_user.id}"
end
def self.stream_to(user, event:, data:)
ActionCable.server.broadcast(
"bizy:user:#{user.id}",
{ event: event, data: data }
)
end
end# app/jobs/bizy/chat_job.rb
class Bizy::ChatJob < LLMJob
def perform(user_id:, message:, conversation_id:, metadata: {})
user = User.find(user_id)
Bizy::Chat.execute(
user: user,
message: message,
metadata: metadata,
conversation_id: conversation_id
) do |chunk|
BizyChannel.stream_to(user, event: "token", data: { text: chunk.content })
end
BizyChannel.stream_to(user, event: "complete", data: {})
end
endLLM calls take 10-60+ seconds. We use Async::Job (fiber-based) instead of Sidekiq (thread-based) to handle many concurrent requests efficiently. Fibers allow thousands of concurrent LLM calls to share a few connections, rather than blocking a thread per request.
# Base class for LLM jobs
class LLMJob < ApplicationJob
self.queue_adapter = :async_job # Uses fibers, not threads
end
# Regular jobs still use Sidekiq
class ImageProcessingJob < ApplicationJob
# Uses default :sidekiq adapter
endDeploys and dyno restarts (common on Heroku) can interrupt in-flight LLM requests. Rather than complex job recovery mechanisms, we handle this gracefully in the frontend.
The scenario:
- User asks question, streaming starts
- Deploy happens, worker process receives SIGTERM
- ActionCable connection drops mid-stream
- Frontend detects disconnect, shows friendly message
- User resubmits their question
Frontend implementation:
// src/modules/bizy/hooks/use-bizy-channel.ts
const useBizyChannel = () => {
const [isStreaming, setIsStreaming] = useState(false);
const [error, setError] = useState<string | null>(null);
useEffect(() => {
const subscription = cable.subscriptions.create("BizyChannel", {
received(data) {
if (data.event === "token") {
appendToken(data.text);
} else if (data.event === "complete") {
setIsStreaming(false);
} else if (data.event === "error") {
setError(data.message);
setIsStreaming(false);
}
},
disconnected() {
if (isStreaming) {
// Connection dropped while waiting for response
setError("Connection interrupted. Please try your question again.");
setIsStreaming(false);
}
},
rejected() {
setError("Unable to connect. Please refresh and try again.");
}
});
return () => subscription.unsubscribe();
}, []);
return { isStreaming, error, clearError: () => setError(null) };
};Why this approach:
- Simple: No job persistence, idempotency keys, or recovery logic
- User-friendly: Clear message, user has full context to retry
- Rare occurrence: Deploys are infrequent relative to chat volume
- No partial state: Incomplete responses aren't saved to history
Tradeoffs:
- User must manually retry (minor inconvenience)
- Partial responses are lost (but they were incomplete anyway)
RubyLLM provides acts_as_chat for ActiveRecord-based persistence, but Bonusly uses Mongoid. We continue using our existing Bizy::ChatHistory model and enhance it with RubyLLM-specific fields.
Why not use RubyLLM's acts_as_chat:
| Aspect | RubyLLM acts_as_chat | Bizy::ChatHistory (enhanced) |
|---|---|---|
| ORM | ActiveRecord only | Mongoid (our stack) |
| Setup | Generator creates migrations | Already exists |
| Bonusly fields | Would need to add | Already has context_type, feedback, etc. |
| Indexes | Would need to recreate | Already optimized for our queries |
| Tool calls | Separate ToolCall model | Embedded array (simpler) |
Enhanced ChatHistory model:
# app/models/bizy/chat_history.rb
class Bizy::ChatHistory
include ApplicationDocument
# Existing fields
field :user_id, type: BSON::ObjectId
field :company_id, type: BSON::ObjectId
field :conversation_id, type: String
field :context_type, type: String
field :context_id, type: BSON::ObjectId
field :user_message, type: String
field :bizy_response, type: String
field :response_time_ms, type: Integer
field :emotion, type: String
# Feedback fields (existing)
field :feedback_type, type: String
field :feedback_category, type: String
field :feedback_details, type: String
# NEW: RubyLLM-specific fields
field :model_id, type: String # e.g., "claude-sonnet-4", "gpt-4o"
field :provider, type: String # e.g., "anthropic", "openai"
field :input_tokens, type: Integer
field :output_tokens, type: Integer
field :thinking_text, type: String # Extended thinking trace
field :tool_calls, type: Array # [{name: "GetOrgContext", arguments: {...}, result: {...}}]
# REMOVE: OpenAI-specific field (no longer needed)
# field :ai_response_id, type: String # RubyLLM handles conversation state
endPersistence is handled by Bizy::Chat
Multi-turn conversations:
Bizy::Chat automatically restores conversation context when a conversation_id is provided. See the restore_conversation_context method in the Bizy::Chat class above.
Pros of this approach:
- Works with our Mongoid stack (no ActiveRecord required)
- Preserves existing Bonusly-specific fields and indexes
- Simpler embedded tool_calls (no separate model)
- Feedback loop already implemented
Cons:
- Manual persistence (RubyLLM doesn't auto-save)
- Must manually rebuild conversation context for multi-turn
- Won't get future RubyLLM persistence features automatically
The frontend sends context metadata with each message:
# Controller receives
{
message: "What did Jane and I discuss?",
context_metadata: {
meeting_partner_id: "abc123", # Enables MeetingAgent
context_type: "meeting" # For logging
}
}Bizy::Chat uses this to determine which sub-agents are available:
# In Bizy::Chat#build_chat
SUB_AGENTS.each do |agent_class|
if agent_class.available?(user: @user, metadata: @metadata)
tools << agent_class.new(user: @user, metadata: @metadata)
end
end| Context | Available Agents | Why |
|---|---|---|
| No metadata | Core tools only | General questions |
meeting_partner_id present |
+ MeetingAgent | 1:1 meeting questions |
| User is admin | + AnalyticsAgent | Company analytics |
| Both | + MeetingAgent + AnalyticsAgent | Full access |
We build the new framework entirely in parallel to existing Bizy. The old system continues to work unchanged while we prove out the new approach. Once validated, we cut over via feature flag.
┌─────────────────────────────────────────────────────────────┐
│ Development Timeline │
├─────────────────────────────────────────────────────────────┤
│ │
│ Existing Bizy (unchanged) │
│ ════════════════════════════════════════════════► │
│ │
│ New Framework (parallel build) │
│ ─────────────────────────────────► [Validate] ─► [Cutover] │
│ │
│ Phases 1-5: Build new system Phase 6: Feature flag test │
│ (old system untouched) Phase 7: Remove old code │
│ │
└─────────────────────────────────────────────────────────────┘
Why parallel development:
- Zero risk to production during development
- Can compare outputs between old and new systems
- Easy rollback (just disable feature flag)
- No partial migrations or hybrid states
Goal: Set up RubyLLM, ActionCable, and async jobs.
# Gemfile
gem 'ruby_llm'
gem 'async-job-adapter-active_job'- Configure RubyLLM with API keys
- Create BizyChannel
- Create LLMJob base class
- Verify:
RubyLLM.chat.ask("Hello")works in console
Existing Bizy: Unchanged
Goal: Create one new tool to prove the pattern.
Create new GetOrgContext in the new location (does not modify existing tool):
# NEW: app/lib/bizy/tools/get_org_context.rb (RubyLLM-based)
# OLD: lib/bizy/tools/get_org_context.rb (still works, untouched)Test in console:
tool = Bizy::Tools::GetOrgContext.new(requesting_user: user)
RubyLLM.chat.with_tools(tool).ask("Who is my manager?")Existing Bizy: Unchanged
Goal: Create new streaming endpoint (separate from existing).
# config/routes.rb
post 'bizy/chat', to: 'chat#create' # Existing (unchanged)
post 'bizy/chat/v2', to: 'chat#create_v2' # New (feature flagged)The new endpoint enqueues a job and returns immediately. Feature flag controls access.
Existing Bizy: Unchanged
Goal: Create agent base class and top-level chat agent.
- Create
Bizy::Agents::Base(base class for sub-agents) - Create
Bizy::Chat(top-level agent with execute method) - Register core tools and sub-agents in
Bizy::Chat
Existing Bizy: Unchanged
Goal: Build out all tools and agents for the new framework.
Core tools (new implementations in app/lib/bizy/tools/):
- GetOrgContext, GetRecognition, GetMilestones, GetHashtags
Agents (new in app/lib/bizy/agents/):
- MeetingAgent (uses GetMeetingHistory, GetMeetingSummary, GetCheckinHistory)
- AnalyticsAgent (uses GetParticipationTrends, GetRecognitionStats)
At this point, the new system is fully functional but behind a feature flag.
Existing Bizy: Unchanged, still serving all traffic
Goal: Prove the new system works, then switch traffic.
- Internal testing: Team uses new system via feature flag
- Shadow mode (optional): Run both systems, compare outputs
- Gradual rollout: Enable for percentage of users via feature flag
- Full cutover: Enable for all users
// Frontend feature flag
const BizyChat = () => {
const useNewFramework = useFeatureFlag('bizy-v2');
if (useNewFramework) {
return <StreamingBizyChat />; // New ActionCable-based
}
return <LegacyBizyChat />; // Existing sync
};Goal: Clean up after successful cutover.
Only after the new system is proven in production:
- Remove
lib/bizy/ai_driver.rb - Remove
lib/bizy/base_tool.rb - Remove
lib/bizy/tools/*.rb(old MCP tools) - Remove MCP controller (if not used elsewhere)
- Remove old routes and feature flag checks
At every phase, existing Bizy continues to work. Rollback is simple:
| Phase | Rollback |
|---|---|
| 1-5 | Nothing to rollback (old system still serving traffic) |
| 6 | Disable feature flag → 100% traffic to old system |
| 7 | Restore deleted code from git (should rarely be needed) |
- New system fully functional behind feature flag
- Internal team validated new system behavior
- Streaming responses working end-to-end
- Performance equal or better than old system
- Feature flag rollout completed without issues
- Old code removed
- Async::Job handling LLM calls
- Feature flag shows no regressions
- Old MCP infrastructure removed
RubyLLM can both consume MCP tools (as a client) and our agents can be exposed via MCP (as a server). This provides flexibility for integration with external systems.
The ruby_llm-mcp gem provides full MCP client support for RubyLLM. This is useful if we want RubyLLM agents to call tools hosted on existing MCP servers (like our current Bizy/BonuslyGPT infrastructure during migration).
require 'ruby_llm/mcp'
mcp_client = RubyLLM::MCP.client(
name: "bizy-mcp",
transport_type: :streamable,
config: { url: ENV['BIZY_MCP_URL'] }
)
chat = RubyLLM.chat.with_tools(*mcp_client.tools)Use cases:
- Gradual migration: New agents use RubyLLM but can still call existing MCP tools
- External MCP servers: Connect to third-party MCP servers (filesystem, databases, etc.)
- Hybrid architecture: Mix local tools with remote MCP tools
If we want external systems (other AI platforms, ChatGPT plugins, etc.) to call our agents, we can expose them as an MCP server. Our existing Mcp::BizyController pattern already does this.
Example pattern (could be extended for new agents):
# app/controllers/mcp/agents_controller.rb
module Mcp
class AgentsController < ApplicationController
# JSON-RPC 2.0 endpoint for MCP
def handle
case json_rpc_method
when "tools/list"
render_tools_list
when "tools/call"
result = execute_tool(params[:name], params[:arguments])
render json: { result: result }
end
end
private
def render_tools_list
render json: {
tools: [
{
name: "ask_bizy",
description: "Ask Bizy a question about Bonusly",
inputSchema: {
type: "object",
properties: {
question: { type: "string", description: "The question to ask" }
},
required: ["question"]
}
}
]
}
end
def execute_tool(name, arguments)
case name
when "ask_bizy"
response = Bizy::Chat.execute(
user: current_user,
message: arguments[:question],
metadata: {}
)
response.content
end
end
end
endUse cases:
- ChatGPT/Claude plugins: Let external AI access Bonusly data
- Enterprise integrations: Partner systems can query our agents
- A2A protocol: If we adopt A2A, MCP servers are a building block
| Scenario | Approach |
|---|---|
| New agent with new tools | Native RubyLLM tools (simplest) |
| New agent + existing MCP tools | ruby_llm-mcp client |
| Expose agent to external systems | MCP server endpoint |
| Gradual migration from MCP | Hybrid (ruby_llm-mcp + native) |
ActionCable is the right choice for Puma. If we later adopt an async-native server like Falcon, SSE becomes viable for streaming without blocking threads.
┌─────────────┐
│ Falcon │
│ (async) │
└──────┬──────┘
│
┌──────▼──────┐
│ SSE Stream │
└──────┬──────┘
│
┌──────▼──────┐
│ Client │
└─────────────┘
Falcon uses Ruby's fiber-based async model, allowing thousands of concurrent connections on a single thread. This pairs naturally with RubyLLM's streaming and async capabilities.
See RubyLLM Async documentation for details on fiber-based concurrency and Falcon integration.
Agents can use other agents as tools:
class Bizy::Agents::ExecutiveReport < Bizy::Agents::Base
description "Generate executive reports"
def tools
[
Bizy::Agents::Analytics.new(user: @user, metadata: @metadata),
Bizy::Tools::GetTopPerformers.new(company: @user.company),
]
end
endWhy MCP exists: MCP evolved as a standard because every LLM provider (OpenAI, Anthropic, Google, etc.) had their own custom mechanism for function calling. MCP provides a way to "build once, use anywhere" - define tools once and expose them to any AI system, regardless of provider.
Why we don't need it for internal tools: RubyLLM handles the first part for us. It provides a unified abstraction over all providers' function calling mechanisms. We write a tool once using RubyLLM's DSL, and it works with OpenAI, Anthropic, Gemini, or any other supported provider.
# This tool works with ANY provider - RubyLLM handles the translation
class Bizy::Tools::GetOrgContext < RubyLLM::Tool
description "Get organizational context for a user"
param :target_user, desc: "User ID, email, or name"
def execute(target_user:)
# Same code works whether we're using Claude, GPT, or Gemini
end
endWhat about exposing tools externally? We don't have a need to expose these tools via MCP yet. If we do in the future (for ChatGPT plugins, partner integrations, etc.), it's easy to add on later - see "Exposing Agents via MCP" in Advanced Topics.
What we gain by removing MCP internally:
- No network round-trip for every tool call (less latency)
- No MCP server process to maintain
- Tools are just Ruby classes - easy to test and debug
- Simpler architecture overall
Orchestration happens in three places:
- Bizy::Chat (setup-time): Assembles available tools and sub-agents based on user permissions and context metadata.
# Bizy::Chat#build_chat decides WHAT tools/agents are available
def build_chat
tools = CORE_TOOLS.map { |klass| klass.new(requesting_user: @user) }
SUB_AGENTS.each do |agent_class|
if agent_class.available?(user: @user, metadata: @metadata)
tools << agent_class.new(user: @user, metadata: @metadata)
end
end
RubyLLM.chat(model: "claude-sonnet-4").with_tools(*tools)
end- The LLM (runtime): Decides WHICH tools to call based on the user's question and tool descriptions. This is the "orchestrator" - it reads the question, looks at available tools, and decides what to invoke.
- Sub-agents (nested runtime): When the LLM invokes an agent-as-tool, that agent runs its own chat with its own tools, creating a nested orchestration layer.
User Question
↓
Bizy::Chat.execute (assembles tools/agents)
↓
Main LLM (decides: "this is a meeting question")
↓
MeetingAgent.execute(question:)
↓
Sub-LLM (uses meeting-specific tools)
↓
Response bubbles back up
The LLM decides based on two sources of guidance:
Each agent has a description that tells the LLM when to use it:
class Bizy::Agents::Meeting < Bizy::Agents::Base
description "Ask questions about 1:1 meetings, past discussions, and meeting prep"
# ...
end
class Bizy::Agents::Analytics < Bizy::Agents::Base
description "Get company analytics, participation trends, and recognition statistics"
# ...
endWhen you ask "What did Jane and I discuss last week?", the LLM sees:
GetOrgContext: "Get organizational context for a user" - not relevantMeetingAgent: "Ask questions about 1:1 meetings, past discussions..." - matches!AnalyticsAgent: "Get company analytics, participation trends..." - not relevant
The LLM then calls MeetingAgent.execute(question: "What did Jane and I discuss last week?").
The Bizy::Chat system prompt provides general guidance without duplicating tool descriptions:
class Bizy::Chat
INSTRUCTIONS = <<~PROMPT
You are Bizy, a helpful AI assistant for Bonusly.
You have access to tools for looking up information and specialized agents
for specific domains. When a question matches a specialized agent's domain,
prefer delegating to that agent rather than answering directly.
Be helpful, concise, and accurate. If you don't have enough information
to answer a question, say so.
PROMPT
endThe system prompt is intentionally minimal - it sets persona and general behavior, while tool descriptions do the heavy lifting for routing decisions. This avoids duplication and keeps things maintainable.
Key points:
- Tool descriptions are the primary mechanism for routing decisions
- System prompt provides persona and general preferences
- No need to duplicate agent info in the system prompt
- The LLM may call multiple tools/agents for complex questions
- If no agent matches, core tools handle it directly
Writing effective descriptions:
Per Anthropic's best practices, aim for at least 3-4 sentences per tool description. Include:
- What the tool does
- When it should be used (and when it shouldn't)
- What each parameter means
- Any important caveats or limitations
Including examples in descriptions can improve routing accuracy:
class Bizy::Agents::Meeting < Bizy::Agents::Base
description <<~DESC
Ask questions about 1:1 meetings and past discussions with a specific person.
Use this agent when the user asks about:
- What was discussed in previous meetings ("What did we talk about last week?")
- Meeting history and summaries ("Give me a recap of my meetings with Jane")
- Preparing for upcoming 1:1s ("What should I discuss with my manager?")
Do NOT use for general questions about the user's calendar or scheduling.
DESC
endToken considerations: Tool descriptions count toward input tokens. A detailed description (~100-200 tokens) is a small cost for better routing accuracy. There's no hard limit, but be mindful if you have many tools.
It might seem confusing at first - aren't agents and tools different things? Here's why the pattern works:
From the LLM's perspective, there's no difference. When the main Bizy chat sees its available "tools," it doesn't know (or care) whether GetOrgContext fetches data from a database or whether MeetingAgent runs an entire sub-conversation. Both are just: "call this with these parameters, get a result back."
The key insight: An agent is just a tool that happens to use an LLM internally.
Regular tool: input → Ruby code → output
Agent-as-tool: input → Ruby code → LLM + more tools → output
Why this is better than alternatives:
| Alternative | Problem |
|---|---|
| Separate routing layer | Extra code to maintain, another place for bugs, duplicates what LLMs do well (understanding intent) |
| All tools flat | Main chat needs to know about every tool in the system, bloated context, harder to maintain |
| Explicit agent handoff | Requires the LLM to understand a special "handoff" concept, more complex prompting |
The agent-as-tool pattern:
- Uses the LLM's natural tool-calling ability for routing
- Encapsulates complexity (the main chat doesn't know MeetingAgent uses 3 sub-tools)
- Composes naturally (agents can use other agents)
- Follows RubyLLM's recommended pattern (see agentic workflows)
Conceptual clarity: Think of it like delegation in an organization. When you ask your assistant a question, they might answer directly (tool) or say "let me check with the finance team and get back to you" (agent-as-tool). Either way, you just asked a question and got an answer.
We evaluated several Ruby AI frameworks before choosing RubyLLM:
LangChain.rb (https://github.com/patterns-ai-core/langchainrb)
- Ruby port of Python LangChain. 2k+ GitHub stars, 15+ providers, production-ready.
- Why not: Heavier abstraction, more "batteries included" than we need. Python-first patterns translated to Ruby.
Raix (https://github.com/OlympiaAI/raix-rails)
- Ruby AI eXtensions for Rails by OlympiaAI.
- Why not: Smaller community (44 stars), less mature tooling ecosystem.
BoxCars (https://www.boxcars.ai/)
- Rails-focused AI gem for text-to-action features.
- Why not: More opinionated about workflow patterns, less flexibility for our agent-as-tool approach.
OmniAI (https://rubygems.org/gems/omniai)
- Unified interface for multiple AI providers.
- Why not: Simpler scope - primarily a provider abstraction, less support for agentic patterns.
Sublayer (https://docs.sublayer.com/)
- Model-agnostic AI agent framework with DSL.
- Why not: Different architectural approach (Generators/Actions), less Rails integration.
Chatwoot AI Agents (https://github.com/chatwoot/ai-agents)
- Ruby AI Agents SDK inspired by OpenAI's Agents SDK. Built on top of RubyLLM. Features multi-agent orchestration, seamless handoffs, and shared context.
- Why not: Actually a strong contender! However, it's a layer on top of RubyLLM. Using RubyLLM directly gives us more control and fewer dependencies. Worth revisiting if we need more complex handoff patterns.
Active Agent (https://www.activeagents.ai/)
- Rails-native AI framework following MVC conventions. Treats agents like controllers, prompts like views. Includes Action Prompt, Generation Provider, and Queued Generation modules. Integrates with background jobs and streaming.
- Why not: Interesting "agents as controllers" approach, but different from our agent-as-tool pattern. Less mature ecosystem. Worth watching as it develops.
Direct SDKs (OpenAI/Anthropic Ruby gems)
- Use provider SDKs directly without abstraction.
- Why not: Provider lock-in, must implement tool calling ourselves.
Why RubyLLM won:
- Right level of abstraction: Not too heavy (LangChain) or too light (OmniAI/direct SDKs)
- Native Rails integration: Built-in generators,
acts_as_chat(even if we use our own persistence) - Agentic patterns documented: Clear guidance on multi-agent orchestration
- Active development: Regular releases, responsive maintainer, growing community
- Provider flexibility: Easy to swap models or use different providers for different tasks
- Async support: Built-in fiber-based concurrency for efficient LLM operations
Trade-off acknowledged: LangChain.rb has a larger community and more built-in integrations (vector stores, document loaders, etc.). If we needed those features heavily, it might be worth the heavier abstraction. For our use case (chat + tools + agents), RubyLLM is the better fit.
| Topic | Link | When to Read |
|---|---|---|
| Getting Started | rubyllm.com | First setup |
| Tool Definition | rubyllm.com/tools | Creating tools |
| Extended Thinking | rubyllm.com/thinking | Complex reasoning tasks |
| Streaming | rubyllm.com/streaming | Real-time responses |
| Async/Fibers | rubyllm.com/async | Background job setup |
| Rails Integration | rubyllm.com/rails | Persistence with acts_as_chat |
| Multi-Agent Patterns | rubyllm.com/agentic-workflows | Agent-as-tool pattern |
| Structured Output | rubyllm.com/chat | Schema-based responses |
- ruby_llm-mcp - MCP client for RubyLLM (for migration period)
- ruby_llm-schema - DSL for defining JSON schemas (bundled with RubyLLM)
- async-job - Fiber-based ActiveJob adapter
- ActionCable Overview - WebSocket fundamentals
- ActiveJob Basics - Background job processing
- Issue #326: Handoff Tools - Native agent delegation (in development)
| File | Purpose |
|---|---|
lib/bizy/ai_driver.rb |
Main orchestrator (to be replaced) |
lib/bizy/base_tool.rb |
MCP tool base class |
lib/bizy/tools/*.rb |
Current tool implementations |
app/controllers/mcp/bizy_controller.rb |
MCP server endpoint |
| Term | Definition |
|---|---|
| Tool | A function the AI can call to get information or perform actions |
| Agent | A specialized AI assistant with its own tools, persona, and instructions |
| Agent-as-Tool | Pattern where specialized agents are invoked as tools by a coordinator |
| Extended Thinking | Giving reasoning models more time/budget to deliberate on complex tasks |
| MCP | Model Context Protocol - JSON-RPC standard for AI tool execution |
| ActionCable | Rails WebSocket framework for real-time bidirectional communication |
| RubyLLM | Provider-agnostic Ruby library for LLM interactions |
| Async::Job | Fiber-based job processor optimized for I/O-bound work like LLM calls |
| Fiber | Lightweight concurrency primitive; many can run on a single thread |