seqis/gist:69ba87a3d8c552b94b8a6bf9612b1c28

## gistfile1.txt
This guide explains how to wire a Telegram bot to Agent Zero (`/api_message`) with:
- secure auth (no hardcoded LLM keys in the bridge)
- per-chat context continuity
- voice/audio relay to local Whisper
- optional extension pattern for non-audio file attachments

Everything below is based on the bridge implementation in:
- `/a0/usr/telegram/telegram_bridge.py`
- `/a0/usr/telegram/send_message.sh`

## 1) High-level architecture

```text
Telegram user
  -> python-telegram-bot bridge (long polling)
    -> POST http://<agent-zero-host>/api_message
       headers: X-API-KEY: <computed token>
       body: { message, context_id? }
    <- { response, context_id }
  <- Telegram reply
```

Voice/audio path:

```text
Telegram voice/audio attachment
  -> bot.get_file(file_id).download_as_bytearray()
  -> POST multipart to local Whisper: files={"audio": (filename, bytes)}
  <- { "text": "...transcript..." }
  -> send transcript into Agent Zero /api_message
```

## 2) Security model (important)

The bridge should not call OpenAI (or other LLM providers) directly. Let Agent Zero handle model routing.

Do this:
1. Store bot token and other secrets in `/a0/usr/secrets.env`.
2. In `/a0/usr/telegram/telegram-agent0.env`, reference secrets as `SECRET(NAME)`.
3. Compute Agent Zero API key at runtime from `/a0/usr/.env` (`A0_PERSISTENT_RUNTIME_ID`, `AUTH_LOGIN`, `AUTH_PASSWORD`) using:
   - `sha256(f"{runtime_id}:{username}:{password}")`
   - base64url encode, strip `=`
   - take first 16 chars

This matches your bridge behavior and avoids persisting the API key in plaintext.

## 3) Config files

`/a0/usr/telegram/telegram-agent0.env` example:

```env
TELEGRAM_BOT_TOKEN="SECRET(TELEGRAM_BOT_TOKEN)"
TELEGRAM_OWNER_USER_ID="SECRET(TELEGRAM_OWNER_USER_ID)"
A0_API_URL="http://localhost:80"
WHISPER_URL="http://<your-whisper-host>:8765/transcribe"
```

`/a0/usr/secrets.env` example:

```env
TELEGRAM_BOT_TOKEN="<bot-token>"
TELEGRAM_OWNER_USER_ID="<numeric-user-id>"
# Optional:
# RESEND_API_KEY="..."
```

Notes:
- DM access is owner-only; group chats are open to group participants (in this implementation).
- If you want strict allow-list behavior for groups, add a group/user ACL gate in `owner_gate()`.

## 4) Bridge runtime behavior

## 4.1 Text messages

- Handler: `filters.TEXT & ~filters.COMMAND`
- For each chat, bridge loads `context_id` from `state.json` and sends it to `/api_message`.
- If Agent Zero returns new `context_id`, bridge saves it for continuity.
- On stale context 404, bridge retries once without `context_id`.
- Long replies are chunked to Telegram max length with markdown fallback.

## 4.2 Voice/audio messages

- Handler: `filters.VOICE | filters.AUDIO`
- File downloaded to memory only (bytearray), not written to disk.
- Transcribe via local Whisper (`multipart/form-data`, field name: `audio`).
- Prefix strategy before sending to Agent Zero:
  - DM default: `[Voice]: <transcript>`
  - Group default: `[Name via voice]: <transcript>`
  - Caption present: caption overrides pending instruction
  - Pending voice instruction (TTL 300s) used if no caption
- Transcription failures and empty transcript have explicit user-visible errors.

## 4.3 State file schema

`/a0/usr/telegram/state.json`:

```json
{
  "contexts": {
    "<chat_id>": "<agent_zero_context_id>"
  },
  "pending_voice": {
    "<chat_id>": {
      "text": "<instruction>",
      "expires_at": 1735689600.0
    }
  }
}
```

Use atomic writes (`tempfile + os.replace`) to avoid corruption.

## 5) Attachment handling status

Current bridge capability:
- Supported: text, voice notes, audio files
- Not supported by default: photos/documents/video relay into Agent Zero content

If someone asks “file attachments,” be explicit: this bridge currently treats voice/audio as supported attachment types; other file types require an extension.

## 6) Extension pattern for photos/documents

If you want generic file attachments, add handlers for `filters.PHOTO | filters.Document.ALL` and convert files into text the LLM can consume.

Recommended pattern:
1. Download file bytes in memory.
2. Enforce max size + MIME allow-list.
3. Route by MIME:
   - `image/*`: OCR or vision endpoint -> text summary
   - `application/pdf`: OCR/text extraction -> text summary
   - plain text/code files: decode and truncate
4. Build an Agent Zero payload string:
   - metadata (`filename`, `mime`, `size`)
   - extracted text/summary
   - optional user caption/instruction
5. Send through existing `/api_message` flow so memory/tools still work.

Minimal skeleton:

```python
app.add_handler(MessageHandler(filters.PHOTO | filters.Document.ALL, handle_file_message))

async def handle_file_message(update, context):
    msg = update.effective_message
    doc = msg.document or (msg.photo[-1] if msg.photo else None)
    if not doc:
        return

    tg_file = await context.bot.get_file(doc.file_id)
    data = await tg_file.download_as_bytearray()

    # TODO: validate size/mime, extract text, summarize
    extracted = extract_to_text(data, mime_type=getattr(doc, "mime_type", "application/octet-stream"))

    agent_text = (
        f"[Attachment: {getattr(doc, 'file_name', 'photo')}"
        f", mime={getattr(doc, 'mime_type', 'unknown')}]\n{extracted}"
    )

    result = await call_agent_zero(agent_text, get_context_id(msg.chat_id))
    # persist context + reply (reuse existing helpers)
```

## 7) Reliability and ops

Use supervised process restart (not ad-hoc shell sessions):
- `supervisorctl restart telegram_bridge`
- Logs: `/a0/usr/telegram/bridge.log`

Health checks to run after deploy:
1. `docker exec agent-zero supervisorctl status telegram_bridge`
2. Send DM text -> verify Agent Zero response.
3. Send voice note -> verify Whisper transcription + Agent Zero response.
4. Send long message -> verify split reply behavior.
5. Run tests:
   - `python3 -m pytest /a0/usr/telegram/tests -v`

## 8) Practical hardening checklist

- Never commit real secrets (`secrets.env` should stay private).
- Keep API timeout + Whisper timeout finite (bridge uses 120s/60s).
- Apply retry only for transient upstream errors (already done for 404/502/503 + connect errors).
- Sanitize logs: do not log raw tokens or secret values.
- Keep all customization under `/a0/usr/` so upgrades don’t wipe it.
	This guide explains how to wire a Telegram bot to Agent Zero (`/api_message`) with:
	- secure auth (no hardcoded LLM keys in the bridge)
	- per-chat context continuity
	- voice/audio relay to local Whisper
	- optional extension pattern for non-audio file attachments

	Everything below is based on the bridge implementation in:
	- `/a0/usr/telegram/telegram_bridge.py`
	- `/a0/usr/telegram/send_message.sh`

	## 1) High-level architecture

	```text
	Telegram user
	-> python-telegram-bot bridge (long polling)
	-> POST http://<agent-zero-host>/api_message
	headers: X-API-KEY: <computed token>
	body: { message, context_id? }
	<- { response, context_id }
	<- Telegram reply
	```

	Voice/audio path:

	```text
	Telegram voice/audio attachment
	-> bot.get_file(file_id).download_as_bytearray()
	-> POST multipart to local Whisper: files={"audio": (filename, bytes)}
	<- { "text": "...transcript..." }
	-> send transcript into Agent Zero /api_message
	```

	## 2) Security model (important)

	The bridge should not call OpenAI (or other LLM providers) directly. Let Agent Zero handle model routing.

	Do this:
	1. Store bot token and other secrets in `/a0/usr/secrets.env`.
	2. In `/a0/usr/telegram/telegram-agent0.env`, reference secrets as `SECRET(NAME)`.
	3. Compute Agent Zero API key at runtime from `/a0/usr/.env` (`A0_PERSISTENT_RUNTIME_ID`, `AUTH_LOGIN`, `AUTH_PASSWORD`) using:
	- `sha256(f"{runtime_id}:{username}:{password}")`
	- base64url encode, strip `=`
	- take first 16 chars

	This matches your bridge behavior and avoids persisting the API key in plaintext.

	## 3) Config files

	`/a0/usr/telegram/telegram-agent0.env` example:

	```env
	TELEGRAM_BOT_TOKEN="SECRET(TELEGRAM_BOT_TOKEN)"
	TELEGRAM_OWNER_USER_ID="SECRET(TELEGRAM_OWNER_USER_ID)"
	A0_API_URL="http://localhost:80"
	WHISPER_URL="http://<your-whisper-host>:8765/transcribe"
	```

	`/a0/usr/secrets.env` example:

	```env
	TELEGRAM_BOT_TOKEN="<bot-token>"
	TELEGRAM_OWNER_USER_ID="<numeric-user-id>"
	# Optional:
	# RESEND_API_KEY="..."
	```

	Notes:
	- DM access is owner-only; group chats are open to group participants (in this implementation).
	- If you want strict allow-list behavior for groups, add a group/user ACL gate in `owner_gate()`.

	## 4) Bridge runtime behavior

	## 4.1 Text messages

	- Handler: `filters.TEXT & ~filters.COMMAND`
	- For each chat, bridge loads `context_id` from `state.json` and sends it to `/api_message`.
	- If Agent Zero returns new `context_id`, bridge saves it for continuity.
	- On stale context 404, bridge retries once without `context_id`.
	- Long replies are chunked to Telegram max length with markdown fallback.

	## 4.2 Voice/audio messages

	- Handler: `filters.VOICE \| filters.AUDIO`
	- File downloaded to memory only (bytearray), not written to disk.
	- Transcribe via local Whisper (`multipart/form-data`, field name: `audio`).
	- Prefix strategy before sending to Agent Zero:
	- DM default: `[Voice]: <transcript>`
	- Group default: `[Name via voice]: <transcript>`
	- Caption present: caption overrides pending instruction
	- Pending voice instruction (TTL 300s) used if no caption
	- Transcription failures and empty transcript have explicit user-visible errors.

	## 4.3 State file schema

	`/a0/usr/telegram/state.json`:

	```json
	{
	"contexts": {
	"<chat_id>": "<agent_zero_context_id>"
	},
	"pending_voice": {
	"<chat_id>": {
	"text": "<instruction>",
	"expires_at": 1735689600.0
	}
	}
	}
	```

	Use atomic writes (`tempfile + os.replace`) to avoid corruption.

	## 5) Attachment handling status

	Current bridge capability:
	- Supported: text, voice notes, audio files
	- Not supported by default: photos/documents/video relay into Agent Zero content

	If someone asks “file attachments,” be explicit: this bridge currently treats voice/audio as supported attachment types; other file types require an extension.

	## 6) Extension pattern for photos/documents

	If you want generic file attachments, add handlers for `filters.PHOTO \| filters.Document.ALL` and convert files into text the LLM can consume.

	Recommended pattern:
	1. Download file bytes in memory.
	2. Enforce max size + MIME allow-list.
	3. Route by MIME:
	- `image/*`: OCR or vision endpoint -> text summary
	- `application/pdf`: OCR/text extraction -> text summary
	- plain text/code files: decode and truncate
	4. Build an Agent Zero payload string:
	- metadata (`filename`, `mime`, `size`)
	- extracted text/summary
	- optional user caption/instruction
	5. Send through existing `/api_message` flow so memory/tools still work.

	Minimal skeleton:

	```python
	app.add_handler(MessageHandler(filters.PHOTO \| filters.Document.ALL, handle_file_message))

	async def handle_file_message(update, context):
	msg = update.effective_message
	doc = msg.document or (msg.photo[-1] if msg.photo else None)
	if not doc:
	return

	tg_file = await context.bot.get_file(doc.file_id)
	data = await tg_file.download_as_bytearray()

	# TODO: validate size/mime, extract text, summarize
	extracted = extract_to_text(data, mime_type=getattr(doc, "mime_type", "application/octet-stream"))

	agent_text = (
	f"[Attachment: {getattr(doc, 'file_name', 'photo')}"
	f", mime={getattr(doc, 'mime_type', 'unknown')}]\n{extracted}"
	)

	result = await call_agent_zero(agent_text, get_context_id(msg.chat_id))
	# persist context + reply (reuse existing helpers)
	```

	## 7) Reliability and ops

	Use supervised process restart (not ad-hoc shell sessions):
	- `supervisorctl restart telegram_bridge`
	- Logs: `/a0/usr/telegram/bridge.log`

	Health checks to run after deploy:
	1. `docker exec agent-zero supervisorctl status telegram_bridge`
	2. Send DM text -> verify Agent Zero response.
	3. Send voice note -> verify Whisper transcription + Agent Zero response.
	4. Send long message -> verify split reply behavior.
	5. Run tests:
	- `python3 -m pytest /a0/usr/telegram/tests -v`

	## 8) Practical hardening checklist

	- Never commit real secrets (`secrets.env` should stay private).
	- Keep API timeout + Whisper timeout finite (bridge uses 120s/60s).
	- Apply retry only for transient upstream errors (already done for 404/502/503 + connect errors).
	- Sanitize logs: do not log raw tokens or secret values.
	- Keep all customization under `/a0/usr/` so upgrades don’t wipe it.
No results found