jwmatthews/onboarding-deep-dive.md

## onboarding-deep-dive.md

      
    Raw
  

              onboarding-deep-dive.md
            
          
    Camel-Kit Deep Dive for Onboarding Engineers

What This Product Actually Is

Camel-Kit is not a conventional "AI application" where a backend calls an LLM API and orchestrates tool invocations itself. It is a prompt-packaging and workflow bootstrap tool for Apache Camel engineering.
Its job is to:

Initialize a new Camel integration workspace.
Install assistant-specific slash commands and workflow skills into that workspace.
Configure Apache Camel's MCP server for live catalog lookup and validation.
Guide an AI coding assistant through a structured, artifact-driven delivery flow:

Business Requirements Document
Technical Design Document
Camel YAML route
Validation report
Citrus integration tests


The core product idea is:

Keep the Java CLI thin.
Put the domain intelligence into versioned SKILL.md assets.
Use MCP for authoritative, version-specific Camel knowledge.
Force the agent to generate intermediate artifacts before code.

For onboarding purposes, the most important mental model is:

The repo's primary runtime behavior lives in prompt assets and file conventions, not in Java business logic.


High-Level Repo Structure

Root modules


camel-kit-core/

The real product core.
Contains the Picocli CLI, init workflow, TUI, downloaders, templates, and all packaged AI skills.


camel-kit-main/

JBang entry point.
Wraps camel-kit-core so users can install and run Camel-Kit as a JBang app.


camel-jbang-plugin-kit/

Adapter layer for camel kit init inside Camel JBang.
Delegates back to camel-kit-core.


camel-kit-plugins/

Currently an empty Maven aggregator with no child modules.


Non-code content


docs/

Human documentation for users and contributors.


examples/

Example workflow, mainly explanatory.


website/

Separate Hugo site for published docs and marketing pages.


Important implication

This is a content-heavy repo:

Java source: about 2.4k lines
Packaged workflow skills: about 5.9k lines
Resources/templates/docs are a large part of the product surface

That distribution is a strong signal about where enhancements usually belong.

What the Codebase Does at Runtime

Runtime surfaces

Camel-Kit has two user-facing execution modes:

Standalone CLI via JBang
Camel JBang plugin via camel kit init

In both cases, the only real Java command implemented today is init.
Actual entry points


Standalone:

camel-kit-main/src/main/jbang/main/CamelKit.java
Delegates to CamelKitMain.run(...)


Core CLI:

camel-kit-core/src/main/java/io/github/luigidemasi/camelkit/CamelKitMain.java


Camel JBang plugin:

camel-jbang-plugin-kit/src/main/java/io/github/luigidemasi/camelkit/jbang/CamelKitPlugin.java
camel-jbang-plugin-kit/src/main/java/io/github/luigidemasi/camelkit/jbang/KitInitCommand.java


What init does

init is the product's bootstrapper. The implementation is in:

camel-kit-core/src/main/java/io/github/luigidemasi/camelkit/command/InitCommand.java

The command performs a fixed sequence:

Validate target AI assistant from AgentRegistry
Resolve target project directory
Create .camel-kit working structure
Write project config and constitution
Install slash command wrappers for the chosen assistant
Copy bundled workflow skills into the assistant's skills/ folder
Create Maven wrapper files
Create assistant-specific MCP configuration
Optionally download Citrus schemas and generate a quick reference
Print next-step instructions

This is the key architectural boundary:

Java code bootstraps the workspace.
The AI assistant then carries the workflow forward by reading installed SKILL.md files.


Module-by-Module Architecture

camel-kit-core

This module owns nearly all product behavior.
CLI shell and command wiring


CamelKitMain.java

Creates JLine terminal/printer
Prints banner/logo
Registers Picocli subcommands
Exposes default Camel and Citrus versions


CamelKitCommand.java

Thin base class for commands


Bootstrap logic


InitCommand.java

The main operational command
Owns workspace creation, template generation, resource copying, MCP config writing, and optional schema fetch


Agent abstraction


AgentRegistry.java
AgentConfig.java

This is intentionally simple:

bob -> .bob/commands
gemini -> .gemini/commands
claude -> .claude/commands

The registry abstracts:

command folder
file format (md vs toml)
assistant label

This is the seam you would extend for a new AI assistant.
Output and UX


JLinePrinter.java
SystemPrinter.java
InitTuiView.java
TaskTracker.java
LogoRenderer.java

This layer does not change product semantics. It improves the perceived quality of init:

native-image-aware terminal rendering
split-screen progress UI when supported
graceful fallback to text/banner mode

Catalog and schema utilities


CatalogDownloader.java
CitrusSchemaDownloader.java
CitrusSchemaProcessor.java
TemplateUtils.java

Important nuance:

CitrusSchemaDownloader is used by init.
CatalogDownloader exists, but the current init flow does not use it.
The preferred runtime architecture in skills is to query Camel live via MCP rather than rely on bundled static catalog files.

camel-kit-main

This module is packaging, not product logic.

It provides the JBang script wrapper.
Maven copies the JBang source into dist/.
The entry point just forwards into camel-kit-core.

If you are changing behavior, this module is usually not where the work belongs.
camel-jbang-plugin-kit

This is an adapter so Camel-Kit can appear as a Camel JBang plugin.
Key point:

It does not reimplement features.
It maps Camel JBang command parameters to the same InitCommand used by standalone mode.

That is a good design choice: one init implementation, multiple front doors.
camel-kit-plugins

Currently a placeholder aggregator. No active plugins live here.
Treat it as future expansion space, not current architecture.

The Real Product Architecture: Artifact-Driven AI Workflow

Camel-Kit uses a staged artifact model.
Stage 0: Bootstrap

camel-kit init creates:

assistant command files
assistant skill files
.camel-kit/config.yaml
.camel-kit/constitution.md
.camel-kit/templates/*
MCP config for the selected assistant
Maven wrapper
schemas/ and test/data/
optional Citrus schema cache

Stage 1: Business requirements

Installed slash command:

camel-project

Primary output:

.camel-kit/business-requirements.md

This captures business intent and integration landscape before any implementation details.
Stage 2: Flow design

Installed slash command:

camel-flow

Primary output:

.camel-kit/flows/{flow-name}/{flow-name}.tdd.md

This is the main design artifact. It encodes:

source system
sink system
processing steps
transformations
dependencies
error handling
test scenarios

Stage 3: Migration path

Installed slash commands:

camel-migrate
internal camel-migrate-mule

Outputs:

.camel-kit/business-requirements.md
.camel-kit/flows/{flow-name}/{flow-name}.tdd.md

This path converges on the same artifacts as the greenfield path. That is a strong design choice because implementation, validation, and testing stay unchanged downstream.
Stage 4: Implementation

Installed slash command:

camel-implement

Expected outputs in project root:

{flow-name}.camel.yaml
application.properties
docker-compose.yaml
run.sh
DataMapper artifacts when applicable

Stage 5: Validation

Installed slash command:

camel-validate

Expected outputs:

validation findings
optionally corrected YAML
a validation report file per skill instructions

Stage 6: Test generation

Installed slash command:

camel-test

Expected outputs:

test/{flow-name}.camel.it.yaml
test/application-test.properties
run-tests.sh
test data files

Architectural consequence

The repo is built around a file-mediated workflow:

Each step reads prior artifacts.
Each step produces a more concrete artifact.
The LLM is expected to operate with those files as shared memory and handoff state.

This is the repo's single most important architectural pattern.

How the AI Workflow Is Encoded

The main prompt assets live under:

camel-kit-core/src/main/resources/skills

Current workflow skills in the repo:

camel-project
camel-flow
camel-implement
camel-validate
camel-test
camel-migrate
camel-migrate-mule
shared datamapper-canonicalize.md

The command files generated by init are intentionally tiny. They mostly say:

Read <assistant>/skills/<skill>/SKILL.md and follow those instructions.

That means the installed slash commands are really just dispatch shims into packaged workflow prompts.

Specific Generative AI and Agentic Patterns Used

1. Prompt-as-product

The product's core intelligence is not hardcoded in Java classes. It is stored as versioned prompt artifacts:

SKILL.md
prompt guides
constitution template
YAML generation guide
validation guide

This makes the repo feel closer to:

a compiler toolchain for AI workflows
a prompt operating system
a spec-driven assistant kit

than a typical application backend.
2. Role-based sub-agents

Each skill assigns the model a narrow working identity:

Business Analyst
Integration Architect
Developer/Implementer
Quality Assurance Engineer
Test Engineer
Migration Specialist
Data Mapping Specialist

That is a classic agentic decomposition pattern: constrain the LLM with a role, a goal, allowed inputs, required outputs, and explicit stop conditions.
3. Artifact-gated progression

Every major skill checks for prerequisite files before it proceeds.
Examples:

camel-flow requires business requirements and config
camel-implement requires BRD and TDD
camel-test requires TDD, implementation, and Citrus reference

This reduces open-ended reasoning. The agent is not asked to improvise the whole system at once; it is forced through explicit gates.
4. Structured interviews instead of open prompting

The skills repeatedly enforce:

ask one question at a time
wait for user response
only ask conditional questions when relevant
avoid re-asking already known facts

This is a deliberate anti-chaos pattern. It narrows the search space and improves consistency.
5. Externalized memory through files

The system uses files as durable working memory:

BRD = business memory
TDD = technical memory
constitution = policy memory
config.yaml = runtime/version memory
generated YAML/test files = implementation memory

This is important because it avoids relying on transient chat context for long-running flows.
6. Hybrid knowledge strategy: static prompts + live MCP

The skills consistently instruct the assistant to prefer MCP tool calls for authoritative, version-specific answers:

camel_catalog_component_doc
camel_catalog_components
camel_catalog_eip_doc
camel_catalog_dataformat_doc
camel_catalog_language_doc
camel_validate_route
camel_route_context
camel_route_harden_context

This is a strong retrieval architecture:

Static skills provide process and guardrails
MCP provides current, versioned truth

That sharply reduces hallucination risk in a domain where option names and YAML structure are version-sensitive.
7. "Never trust model memory" as a first-class rule

This pattern shows up all over the skills:

do not suggest components before querying the catalog
do not use training data for option names
do not assume expression language names
verify all component property names against catalog docs

This is one of the repo's best design decisions. It treats the model as a planner/generator, not as an authoritative source of framework truth.
8. Deterministic fallback paths

The skills usually define:

primary path via MCP
fallback via bundled skills or static guides
user escalation if neither is available

That is a robust agent pattern because tool failure does not automatically collapse the workflow.
9. Progressive disclosure

The skills are designed to load more guidance only when the context demands it.
Examples:

load performance.md only if throughput/latency matters
load security.md only if compliance/security matters
load monitoring.md only if observability matters
load DataMapper guides only for relevant format pairs

This controls token usage and keeps the assistant focused.
10. Prompt-enforced validation loops

camel-implement does not stop at generation. It instructs the assistant to:

generate YAML
validate it
fix errors
re-query official docs when needed
retry until valid

That is an agentic "generate -> verify -> repair" loop embedded directly in the prompt architecture.
11. Canonicalization before generation

The DataMapper flow is the clearest example.
Instead of asking the model to jump straight from semantic mapping to XSLT, Camel-Kit inserts an intermediate canonicalization step:

collect semantic mappings
compute canonical Source XPath values
compute canonical Target Element values
store them in the TDD
then generate XSLT from that canonical form

This is a high-quality agent pattern:

convert fuzzy user intent into a stable intermediate representation
generate code from the representation, not directly from prose

That is essentially IR-driven code generation for prompt systems.
12. Specialized sub-workflows for difficult transformations

The DataMapper path is a mini pipeline:

datamapper-interview.md or datamapper-migrate.md
datamapper-canonicalize.md
datamapper-implement.md

This breaks a hard problem into manageable phases:

elicitation
normalization
code generation
self-validation

That is more sophisticated than typical prompt kits and is one of the strongest agentic design patterns in the repo.
13. Convergent workflow design

Greenfield and migration both converge on the same BRD/TDD artifacts.
That means:

fewer downstream branches
shared implementation logic
shared validation logic
shared testing logic

This is not just good product design. It is good agent design, because it limits prompt divergence.

DataMapper: The Most Specialized Agentic Subsystem

The DataMapper subsystem is the most engineered prompt workflow in the repo.
Relevant files:

skills/camel-flow/guides/datamapper-interview.md
skills/camel-migrate-mule/guides/datamapper-migrate.md
skills/shared/datamapper-canonicalize.md
skills/camel-implement/guides/datamapper-implement.md

Why it matters

Data transformation is where LLMs tend to become unreliable:

path semantics drift
type assumptions drift
JSON/XML conversions are error-prone
generated XSLT is easy to get subtly wrong

Camel-Kit counters that by forcing structure.
Pattern used


Gather schema or schema-like field information
Infer semantic mappings
Confirm mappings with the user
Canonicalize into machine-usable structural fields
Write canonical mapping section into TDD
Generate XSLT from canonical data
Self-validate generated XSLT against the TDD
Inject YAML step and .kaoto metadata

Engineering takeaway

If you need to enhance any transformation-heavy feature, follow this same pattern:

do not generate final code directly from conversational requirements
first create a constrained, explicit intermediate representation


Packaging and Distribution Architecture

Maven

The root pom.xml is a standard multi-module aggregator:

camel-kit-main
camel-kit-core
camel-kit-plugins
camel-jbang-plugin-kit

JBang

The JBang alias is defined in:

jbang-catalog.json

It points to:

camel-kit-main/src/main/jbang/main/CamelKit.java

Camel JBang plugin

The plugin module depends on:

camel-jbang-core with provided scope
camel-kit-core

That keeps feature logic centralized while exposing it inside Camel's CLI ecosystem.

Where To Make Changes

If you want to change the user workflow

Edit the skill files first:

camel-project/SKILL.md
camel-flow/SKILL.md
camel-implement/SKILL.md
camel-validate/SKILL.md
camel-test/SKILL.md

That is usually more important than changing Java.
If you want to add a new AI assistant

Start here:

AgentRegistry.java
InitCommand.java

You will need to define:

assistant command folder
file format for command wrappers
MCP config file shape/location
expected command invocation convention

If you want to add a new migration vendor

Pattern to follow:

extend camel-migrate/SKILL.md
create a new internal vendor sub-skill
add vendor-specific mapping guides
keep outputs identical to BRD/TDD produced by greenfield flow

That last rule is critical. The downstream pipeline depends on convergence.
If you want to harden generated code quality

Primary hotspots:

camel-implement/SKILL.md
camel-validate/SKILL.md
templates/constitution.md

The quality system is prompt-enforced more than code-enforced.
If you want to improve bootstrap behavior

Primary hotspots:

InitCommand.java
CitrusSchemaDownloader.java
CitrusSchemaProcessor.java


What Is Strong About the Current Design


Clear separation between bootstrap code and AI workflow content.
Strong bias toward version-aware, authoritative MCP lookup.
Good use of intermediate artifacts to reduce prompt ambiguity.
Convergent greenfield/migration architecture.
DataMapper pipeline shows disciplined prompt engineering, not ad hoc prompting.
Thin Java surface area makes the system easy to reason about operationally.


Current Gaps and Repo Realities

These are important for a new engineer because the repo contains some documentation drift.
1. Docs describe more bundled component skills than the repo currently contains

The architecture docs describe hundreds of pre-generated component skills, but the checked-in repo currently contains:

seven workflow SKILL.md files
one shared DataMapper guide
guide documents under those workflow folders

I did not find any bundled camel-component-* skill directories in the current checkout.
That means one of these is true:

the docs are ahead of the repo
the component skills are generated elsewhere and not checked in
the architecture changed and the docs were not fully updated

Treat the current codebase as the source of truth unless maintainers clarify otherwise.
2. The CLI surface is narrower than the documentation implies

Today, the Java CLI primarily implements init.
Commands like camel-flow and camel-implement are not Java subcommands. They are installed assistant commands that forward into packaged prompts.
That distinction matters when debugging "why a command behaved this way."
3. Gemini MCP docs and implementation appear inconsistent

The docs commonly refer to .gemini/mcp.json, but InitCommand writes Gemini config to:

.gemini/settings.json

That should be reviewed and normalized.
4. Some contributor docs are outdated

CONTRIBUTING.md still references paths and modules that do not match the current repo state, including:

old package paths
non-existent plugin/module structure
non-existent Python utilities

Do not treat it as fully current without verification.
5. There are no automated tests in the repo right now

I found no src/test files in the current checkout.
Quality is currently enforced through:

build success
prompt constraints
generated validation steps
runtime validation instructions

That is workable, but it raises the importance of careful regression checking whenever skill text changes.

Recommended Reading Order for a New Engineer


README.md
camel-kit-core/src/main/java/io/github/luigidemasi/camelkit/command/InitCommand.java
camel-kit-core/src/main/resources/skills/camel-flow/SKILL.md
camel-kit-core/src/main/resources/skills/camel-implement/SKILL.md
camel-kit-core/src/main/resources/skills/camel-validate/SKILL.md
The DataMapper guides
camel-jbang-plugin-kit/src/main/java/io/github/luigidemasi/camelkit/jbang/KitInitCommand.java

That order moves from:

product concept
to bootstrap implementation
to prompt architecture
to specialized generation logic
to packaging adapters


Practical Advice for Enhancing the Product


Treat skill text changes like code changes. Small wording edits can materially change behavior.
Preserve the staged artifact model. It is the main defense against prompt drift.
Prefer new intermediate representations over larger prompts when adding complexity.
Keep using MCP as the truth source for version-sensitive Camel knowledge.
Tighten documentation drift early. In this repo, stale docs can mislead contributors faster than stale code.
Add regression fixtures if you extend the workflow. Even simple golden-file examples would materially improve confidence.


Build Status During This Review

I verified the current checkout builds successfully with:
./mvnw -q -DskipTests package
That confirms the multi-module build is healthy at a packaging level, but it does not compensate for the current lack of automated test coverage.
No results found