Created
February 21, 2026 07:38
-
-
Save seqis/69ba87a3d8c552b94b8a6bf9612b1c28 to your computer and use it in GitHub Desktop.
Agent Zero to Telegram Bridge: How-To (AI-Coder Focus)
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| This guide explains how to wire a Telegram bot to Agent Zero (`/api_message`) with: | |
| - secure auth (no hardcoded LLM keys in the bridge) | |
| - per-chat context continuity | |
| - voice/audio relay to local Whisper | |
| - optional extension pattern for non-audio file attachments | |
| Everything below is based on the bridge implementation in: | |
| - `/a0/usr/telegram/telegram_bridge.py` | |
| - `/a0/usr/telegram/send_message.sh` | |
| ## 1) High-level architecture | |
| ```text | |
| Telegram user | |
| -> python-telegram-bot bridge (long polling) | |
| -> POST http://<agent-zero-host>/api_message | |
| headers: X-API-KEY: <computed token> | |
| body: { message, context_id? } | |
| <- { response, context_id } | |
| <- Telegram reply | |
| ``` | |
| Voice/audio path: | |
| ```text | |
| Telegram voice/audio attachment | |
| -> bot.get_file(file_id).download_as_bytearray() | |
| -> POST multipart to local Whisper: files={"audio": (filename, bytes)} | |
| <- { "text": "...transcript..." } | |
| -> send transcript into Agent Zero /api_message | |
| ``` | |
| ## 2) Security model (important) | |
| The bridge should not call OpenAI (or other LLM providers) directly. Let Agent Zero handle model routing. | |
| Do this: | |
| 1. Store bot token and other secrets in `/a0/usr/secrets.env`. | |
| 2. In `/a0/usr/telegram/telegram-agent0.env`, reference secrets as `SECRET(NAME)`. | |
| 3. Compute Agent Zero API key at runtime from `/a0/usr/.env` (`A0_PERSISTENT_RUNTIME_ID`, `AUTH_LOGIN`, `AUTH_PASSWORD`) using: | |
| - `sha256(f"{runtime_id}:{username}:{password}")` | |
| - base64url encode, strip `=` | |
| - take first 16 chars | |
| This matches your bridge behavior and avoids persisting the API key in plaintext. | |
| ## 3) Config files | |
| `/a0/usr/telegram/telegram-agent0.env` example: | |
| ```env | |
| TELEGRAM_BOT_TOKEN="SECRET(TELEGRAM_BOT_TOKEN)" | |
| TELEGRAM_OWNER_USER_ID="SECRET(TELEGRAM_OWNER_USER_ID)" | |
| A0_API_URL="http://localhost:80" | |
| WHISPER_URL="http://<your-whisper-host>:8765/transcribe" | |
| ``` | |
| `/a0/usr/secrets.env` example: | |
| ```env | |
| TELEGRAM_BOT_TOKEN="<bot-token>" | |
| TELEGRAM_OWNER_USER_ID="<numeric-user-id>" | |
| # Optional: | |
| # RESEND_API_KEY="..." | |
| ``` | |
| Notes: | |
| - DM access is owner-only; group chats are open to group participants (in this implementation). | |
| - If you want strict allow-list behavior for groups, add a group/user ACL gate in `owner_gate()`. | |
| ## 4) Bridge runtime behavior | |
| ## 4.1 Text messages | |
| - Handler: `filters.TEXT & ~filters.COMMAND` | |
| - For each chat, bridge loads `context_id` from `state.json` and sends it to `/api_message`. | |
| - If Agent Zero returns new `context_id`, bridge saves it for continuity. | |
| - On stale context 404, bridge retries once without `context_id`. | |
| - Long replies are chunked to Telegram max length with markdown fallback. | |
| ## 4.2 Voice/audio messages | |
| - Handler: `filters.VOICE | filters.AUDIO` | |
| - File downloaded to memory only (bytearray), not written to disk. | |
| - Transcribe via local Whisper (`multipart/form-data`, field name: `audio`). | |
| - Prefix strategy before sending to Agent Zero: | |
| - DM default: `[Voice]: <transcript>` | |
| - Group default: `[Name via voice]: <transcript>` | |
| - Caption present: caption overrides pending instruction | |
| - Pending voice instruction (TTL 300s) used if no caption | |
| - Transcription failures and empty transcript have explicit user-visible errors. | |
| ## 4.3 State file schema | |
| `/a0/usr/telegram/state.json`: | |
| ```json | |
| { | |
| "contexts": { | |
| "<chat_id>": "<agent_zero_context_id>" | |
| }, | |
| "pending_voice": { | |
| "<chat_id>": { | |
| "text": "<instruction>", | |
| "expires_at": 1735689600.0 | |
| } | |
| } | |
| } | |
| ``` | |
| Use atomic writes (`tempfile + os.replace`) to avoid corruption. | |
| ## 5) Attachment handling status | |
| Current bridge capability: | |
| - Supported: text, voice notes, audio files | |
| - Not supported by default: photos/documents/video relay into Agent Zero content | |
| If someone asks “file attachments,” be explicit: this bridge currently treats voice/audio as supported attachment types; other file types require an extension. | |
| ## 6) Extension pattern for photos/documents | |
| If you want generic file attachments, add handlers for `filters.PHOTO | filters.Document.ALL` and convert files into text the LLM can consume. | |
| Recommended pattern: | |
| 1. Download file bytes in memory. | |
| 2. Enforce max size + MIME allow-list. | |
| 3. Route by MIME: | |
| - `image/*`: OCR or vision endpoint -> text summary | |
| - `application/pdf`: OCR/text extraction -> text summary | |
| - plain text/code files: decode and truncate | |
| 4. Build an Agent Zero payload string: | |
| - metadata (`filename`, `mime`, `size`) | |
| - extracted text/summary | |
| - optional user caption/instruction | |
| 5. Send through existing `/api_message` flow so memory/tools still work. | |
| Minimal skeleton: | |
| ```python | |
| app.add_handler(MessageHandler(filters.PHOTO | filters.Document.ALL, handle_file_message)) | |
| async def handle_file_message(update, context): | |
| msg = update.effective_message | |
| doc = msg.document or (msg.photo[-1] if msg.photo else None) | |
| if not doc: | |
| return | |
| tg_file = await context.bot.get_file(doc.file_id) | |
| data = await tg_file.download_as_bytearray() | |
| # TODO: validate size/mime, extract text, summarize | |
| extracted = extract_to_text(data, mime_type=getattr(doc, "mime_type", "application/octet-stream")) | |
| agent_text = ( | |
| f"[Attachment: {getattr(doc, 'file_name', 'photo')}" | |
| f", mime={getattr(doc, 'mime_type', 'unknown')}]\n{extracted}" | |
| ) | |
| result = await call_agent_zero(agent_text, get_context_id(msg.chat_id)) | |
| # persist context + reply (reuse existing helpers) | |
| ``` | |
| ## 7) Reliability and ops | |
| Use supervised process restart (not ad-hoc shell sessions): | |
| - `supervisorctl restart telegram_bridge` | |
| - Logs: `/a0/usr/telegram/bridge.log` | |
| Health checks to run after deploy: | |
| 1. `docker exec agent-zero supervisorctl status telegram_bridge` | |
| 2. Send DM text -> verify Agent Zero response. | |
| 3. Send voice note -> verify Whisper transcription + Agent Zero response. | |
| 4. Send long message -> verify split reply behavior. | |
| 5. Run tests: | |
| - `python3 -m pytest /a0/usr/telegram/tests -v` | |
| ## 8) Practical hardening checklist | |
| - Never commit real secrets (`secrets.env` should stay private). | |
| - Keep API timeout + Whisper timeout finite (bridge uses 120s/60s). | |
| - Apply retry only for transient upstream errors (already done for 404/502/503 + connect errors). | |
| - Sanitize logs: do not log raw tokens or secret values. | |
| - Keep all customization under `/a0/usr/` so upgrades don’t wipe it. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment