mavdol/sandboxing_untrusted_python.md

## sandboxing_untrusted_python.md

      
    Raw
  

              sandboxing_untrusted_python.md
            
          
    Sandboxing Untrusted Python

Python doesn't have a built-in way to run untrusted code safely. Multiple attempts have been made, but none really succeeded.
Why? Because Python is a highly introspective object-oriented language with a mutable runtime. Core elements of the interpreter can be accessed through the object graph, frames and tracebacks, making runtime isolation difficult.
This means that even aggressive restrictions can be bypassed:
# Attempt: Remove dangerous built-ins
del __builtins__.eval
del __builtins__.__import__

# 1. Bypass via introspection
().__class__.__bases__[0].__subclasses__()

# 2. Bypass via exceptions and frames
try:
    raise Exception
except Exception as e:
    e.__traceback__.tb_frame.f_globals['__builtins__']
Note
Older alternatives like sandbox-2 exist, but they provide isolation near the OS level, not the language level. At that point we might as well use Docker or VMs.

So people concluded it's safer to run Python in a sandbox rather than sandbox Python itself.
The thing is, Python dominates AI/ML, especially the AI agents space. We're moving from deterministic systems to probabilistic ones, where executing untrusted code is becoming common.
Why sandboxing became important now?

2025 was marked by great progress but also showed us that isolation for AI agents goes beyond resource control or retry strategies. It's become a security issue.
LLMs have architectural flaws. The most notorious one is prompt injection, which exploits the fact that LLMs can't tell the difference between system prompt, legitimate user instructions and malicious ones injected from external sources.
For example, People demonstrate how a hidden instruction can be injected from a web page through the coding agent and extract sensitive data from your .env file.
It's a pretty common pattern. We've found similar flaws across many AI tools in recent months. Take Model Context Protocol (MCP) for example. It shows how improper implementation extends the attack surface: the SQLite MCP server was forked thousands of times despite SQL injection vulnerabilities.
At least developers using coding agents or MCPs most likely know or are informed about the risks. But for non-technical users accessing AI through third-party services, it's a different story, and this is clear from the incidents we keep seeing like private data leaks or browser-based AI issues.
For me, the most important thing is to make sure that unaware users remain safe when using the AI agents we implement.

The solution isn’t better prompts. It’s isolation.

Like we said, focusing on the prompt is missing the point. You can't filter or prompt-engineer your way out of injections or architectural flaws. The solution has to be at the infrastructure level, through isolation and least privilege.
But what does isolation look like in practice? If your agent needs to read a specific configuration file, it should only have access to that file, not your entire filesystem. If it needs to query a customer database, it should connect with read-only credentials scoped to specific tables, not root access.
We can think of this in terms of levels of isolation :

  
      flowchart TB
    direction TB
    Code["AI Agent"]

    subgraph Isolation["🛡️ Isolation Layers"]
        direction TB
        FS["<b>Filesystem Isolation</b> <br/><i>Only /tmp/agent_sandbox</i>"]
        NET["<b>Network Isolation</b> <br/><i>Allowlisted APIs only</i>"]
        CRED["<b>Credential Scoping</b> <br/><i>Least privilege tokens</</i>"]
        RT["<b>Runtime Isolation</b> <br/><i>Sandboxed environment</i>"]
    end

    subgraph Protected["🔒 Protected Resources"]
        direction TB
        Home["/home, /etc, .env"]
        ExtAPI["External APIs"]
        DB["Databases & CRMs"]
        Infra["Core Infrastructure"]
    end

    Code --> FS
    Code --> NET
    Code --> CRED
    Code --> RT

    FS -.->|❌ Blocked| Home
    NET -.->|❌ Blocked| ExtAPI
    CRED -.->|❌ Blocked| DB
    RT -.->|❌ Blocked| Infra

    style Isolation fill:#2ecc71,stroke:#27ae60,color:#fff
    style Protected fill:#e74c3c,stroke:#c0392b,color:#fff
    style FS fill:#9b59b6,stroke:#8e44ad,color:#fff
    style NET fill:#9b59b6,stroke:#8e44ad,color:#fff
    style CRED fill:#9b59b6,stroke:#8e44ad,color:#fff
    style RT fill:#9b59b6,stroke:#8e44ad,color:#fff

    
      Loading

  
In a perfect world, we would apply these isolation layers to all AI agents, whether they're part of large enterprise platforms or small frameworks.
I believe this is the only way to prevent systemic issues. However, I'm also aware that many agents do need more context or access to specific resources to function properly.
So, the real challenge is finding the right balance between security and functionality.

What the industry does

There are several ways to sandbox AI agents, most of them operating at the infrastructure level, outside the Python code itself. From what I see, two main paradigms stand out: one is to sandbox the entire agent, and the other is to sandbox each individual task separately.
For sandboxing the entire agent, we have many solutions:


Firecracker: A microVM that provides a sandboxed environment for running untrusted code. It requires KVM, so it's Linux-only, but it's still a solid solution for agent-level isolation. The downside is that for granular task isolation, it introduces more overhead, higher resource consumption, and added complexity.
AWS originally built it for Lambda, making it the closest option to "secure by default".


Docker: Everyone uses it. But it's not the most secure option. Security teams recommend Firecracker or gVisor (which we'll cover below) for agent-level isolation. And like Firecracker, it's a bit heavy for granular isolation.


On the other hand, for task-level isolation, we have only one popular option: gVisor.
Gvisor sits between container and VMs and provides a strong isolation. Not as strong as Firecracker, but still a solid choice. In my opinion, if you're already using Kubernetes, gVisor is the natural fit, even though it's flexible enough to work with any container runtime.
The only downside is that it's also Linux-only, since it was designed to secure Linux containers by intercepting and reimplementing Linux system calls.  On top of that, it adds a non-trivial overhead. That's something to keep in mind when you're sandboxing at the task level.
An emerging alternative: WebAssembly (WASM)

I remember reading an NVIDIA article about using WebAssembly to sandbox AI agents and run them directly in the browser. This is a very interesting approach, though WASM can be used for sandboxing in other contexts too.
I'll admit, I might be a bit biased, since I've started a few projects around WASM recently. But I wouldn't mention it if I wasn't convinced by its technical strengths: no elevated privileges by default, no filesystem, network, or environment variable access unless explicitly granted, which is a great advantage. It can definitely work with or even compete against solutions like Firecracker or gVisor for low-overhead, task-level isolation.
Obviously, the ecosystem is still young, so there are some limits. It supports pure Python well, but support for C extensions is still evolving and not fully working for now. This impacts ML libraries like NumPy, Pandas, or TensorFlow.
Even with these constraints, I believe this could be promising in the future. That's why I'm working on an open-source solution that uses it to isolate individual agent tasks. The goal is to keep it simple, you can sandbox a task just by adding a decorator:
from capsule import task

@task(name="analyze_data", compute="MEDIUM", ram="512MB", timeout="30s", max_retries=1)
def analyze_data(dataset: list) -> dict:
    # Your code runs safely in a Wasm sandbox
    return {"processed": len(dataset), "status": "complete"}
If you're curious, the project is on My GitHub.
Where do we go from here?

Firecracker and gVisor have shown us that strong isolation is possible. And now, we're seeing newer players like WebAssembly come in, which can help us isolate things at a much more granular task level.
So if you're designing agent systems now, I would recommend planning for failure from the start. Assume that an agent will, at some point, execute untrusted code, process malicious instructions, or simply consume excessive resources. Your architecture must be ready to contain all of these scenarios.
Thank you for reading. I'm always interested in discussing these approaches, so feel free to reach out with any thoughts or feedback.
No results found