Types tell one story: what success looks like.
Traits tell another: what capabilities exist and how they compose.
Errors are the missing half of the map: how those capabilities actually fail in the real world.
If types are the coordinates of valid states, errors are the geography of refusal:
- where the world said “no,”
- what state we ended up in instead,
- and what moves are still possible from there.
This is not a framework. There’s no global error type you have to funnel everything through. It’s a way of thinking plus a handful of concrete patterns so good error designs fall out of you by reflex.
Baselines:
- Use
Result<T, E>for operations that can fail. Eis a structured domain error, not aStringoranyhow::Error(except at very outer edges).- Panics are for bugs. If the user, network, disk, clock, or another process can cause it, it’s not a panic.
- In Rust, we use:
thiserrorfor enums that implementstd::error::Error.- An internal error-set type (currently
terrors::OneOf) to compose sets of leaf errors inside a module.
Everything else is detail.
Keep these three in your head; they’re the “snap-to-grid” for everything else:
-
Design for the caller’s decision surface.
Every error exists to change what the caller does next.
If two failures never induce different behavior, they are the same variant.
If callers can’t tell failures apart but should, your error type is lying. -
Preserve the confession until a conscious boundary.
Inside the system, errors are structured evidence: what we tried, on what, what stopped us.
Don’t crush that into a string, HTTP status, or log line until you’re at a boundary where humans or other systems consume it. -
Push what you can into types, what you must into errors, and what remains into panics.
Preconditions that must always hold → types.
Failures that can happen in a valid world →Result.
Anything left is a bug; crash loudly.
Not “something went wrong.”
An error is a structured, expected refusal.
“Given those inputs and this world, this operation declined, and here’s the state we ended up in instead.”
- Success: we kept the promise, here’s the result.
- Error: the request was valid, but the world said no, and we can describe how.
- Bug: a precondition or invariant we promised ourselves is false.
Informally, for:
pub fn op(input: Input) -> Result<Output, Error>read:
For every
inputthat satisfies our preconditions, either:
- we return a valid
Output, or- we return an
Errordescribing a real state of the world from which a caller can make a meaningful decision.
If there is no meaningful decision to make (“we’re corrupt, crash”), that’s not an error; that’s a bug.
From the trait doc: “Traits are promises made to strangers.” Error types are the same thing, but for the negative space.
Erase all code; keep only:
pub fn create_user(...) -> Result<UserId, CreateUserError>;If the CreateUserError docs + variants don’t let a stranger answer:
- Can I safely retry?
- Is this my fault, the domain’s rules, or the environment?
- Should I show a message, change input, or page someone?
…then the error type isn’t done yet.
A good error type is one where, knowing only that enum + docs, you can still write sane handling code.
We use three distinct layers:
- Leaf error types – small structs/enums for specific failure modes.
- Capability error enums (canonical) – one per capability, used in traits and at boundaries.
- Internal error sets –
OneOf<(…)>style sets used inside a module for precision & composition.
These are the “atoms” of failure inside a module.
Example:
#[derive(Debug, thiserror::Error)]
pub enum InvalidKey {
#[error("key is empty")]
Empty,
#[error("key `{raw}` contains forbidden characters")]
ForbiddenChars { raw: String },
}
#[derive(Debug, thiserror::Error)]
#[error("backend {backend} is unavailable: {reason}")]
pub struct BackendUnavailable {
pub backend: &'static str,
pub reason: String,
#[source]
pub source: std::io::Error,
}
#[derive(Debug, thiserror::Error)]
#[error("write for `{key}` may have partially completed")]
pub struct UnknownWriteStatus {
pub key: String,
#[source]
pub source: std::io::Error,
}Guidelines:
- Make them honest, specific stories: what we were doing, on what, what stopped us.
- Keep them local to a module or crate unless they’re genuinely shared concepts.
- You don’t have to expose them publicly; they can be
pub(crate).
For each real capability (usually a trait), there is one canonical error enum.
This is what we expose across module boundaries, log on, and/or classify.
use thiserror::Error;
#[derive(Debug, Error)]
pub enum StorageError {
#[error(transparent)]
InvalidKey(#[from] InvalidKey),
#[error(transparent)]
BackendUnavailable(#[from] BackendUnavailable),
#[error(transparent)]
UnknownWriteStatus(#[from] UnknownWriteStatus),
}This is the thing you put in trait signatures:
pub trait Storage {
fn get(&self, key: &Key) -> Result<Option<Value>, StorageError>;
fn put(&self, key: &Key, value: &Value) -> Result<(), StorageError>;
}Rules:
- Variants are behaviors; fields are details. If callers should behave differently, they get different variants.
- No library names in variant names.
InvalidKey, notSerdeError;BackendUnavailable, notReqwestError. Underlying crates go in#[source]. - This is the type you document. “Here is what storage can refuse to do, and why.”
Inside a module, we sometimes want per‑function precision or to compose several capabilities without inventing a new enum every time.
We use an internal error-set type for that (currently terrors::OneOf):
use terrors::OneOf;
type StorageGetError = OneOf<(InvalidKey, BackendUnavailable)>;
type StoragePutError = OneOf<(BackendUnavailable, UnknownWriteStatus)>;Internal implementation can use these:
fn inner_get(...) -> Result<Option<Value>, StorageGetError> { ... }
fn inner_put(...) -> Result<(), StoragePutError> { ... }At the boundary of the capability, we collapse back to StorageError:
impl Storage for MyStorage {
fn get(&self, ...) -> Result<Option<Value>, StorageError> {
match inner_get(...) {
Ok(v) => Ok(v),
Err(e) => match e.as_enum() {
terrors::E2::A(invalid) => StorageError::InvalidKey(invalid),
terrors::E2::B(backend) => StorageError::BackendUnavailable(backend),
},
}
}
fn put(&self, ...) -> Result<(), StorageError> {
match inner_put(...) {
Ok(()) => Ok(()),
Err(e) => match e.as_enum() {
terrors::E2::A(backend) => StorageError::BackendUnavailable(backend),
terrors::E2::B(unknown) => StorageError::UnknownWriteStatus(unknown),
},
}
}
}Conventions:
OneOfstays internal. Public APIs and traits talk in terms ofStorageError,UserRepoError,EvalError, etc., notOneOf.- Use
OneOfwhen it saves a bunch of glue enums or gives you useful per‑function error sets. - Don’t turn the whole codebase into a
OneOfshowcase. It’s a plumbing tool, not the user‑facing story.
The key rule:
Distinct behavior ⇒ distinct variant. Same behavior ⇒ one variant, more fields.
Bad:
pub enum FetchError {
Io(std::io::Error),
Http(reqwest::Error),
Serde(serde_json::Error),
}Callers can’t tell what to do with this. “Serde error” and “Io error” aren’t world states.
Better:
#[derive(Debug, Error)]
pub enum FetchUserError {
#[error("could not reach user service at {endpoint}")]
Transport {
endpoint: Url,
#[source]
source: std::io::Error,
},
#[error("user {id} was not found")]
NotFound { id: UserId },
#[error("invalid response from user service at {endpoint}")]
BadResponse {
endpoint: Url,
status: u16,
body_snippet: String,
},
#[error("caller is not authorized to fetch user {id}")]
Unauthorized { id: UserId },
}Transport→ maybe retry.NotFound→ show “user not found.”BadResponse→ alert / internal error.Unauthorized→ auth/UX response.
“Serde error” is not a world state. “IO error” is not a world state.
World states are:
- “We couldn’t reach the host.”
- “We couldn’t parse the response body.”
- “We tried to write to disk; we don’t know if it committed.”
Every variant should answer:
- What were we trying to do?
- On what?
- What stopped us?
- What do we know (or not know) about side effects?
If you can’t answer those, you don’t have a variant yet; you have a log line.
Public APIs should not force callers to care about which HTTP client or DB driver you picked.
Bad:
pub enum CreateUserError {
Sql(sqlx::Error),
Redis(redis::Error),
Http(reqwest::Error),
}Better:
#[derive(Debug, Error)]
pub enum CreateUserError {
#[error("username `{username}` is invalid: {reason}")]
InvalidUsername { username: String, reason: String },
#[error("user `{username}` already exists")]
AlreadyExists { username: String },
#[error("could not persist new user `{username}`")]
Storage {
username: String,
#[source]
source: sqlx::Error,
},
}Variants talk in domain terms. Crate errors live in #[source].
Think of error handling as a pipeline:
- Detect – an operation fails (syscall, DB query, parse, etc.).
- Capture – close to the failure, we wrap it in a leaf error that describes what we were doing.
- Enrich – as errors bubble up, callers wrap them in higher‑level errors, adding context.
- Decide – at a boundary, we choose policy: retry, fallback, degrade, crash, surface to user.
- Render – we log, emit metrics, or produce an HTTP/CLI response.
Good designs:
- Use small, honest error types internally.
- Use canonical capability enums at module and trait boundaries.
- Use coarse, stable categories only at the skin (HTTP, CLI, jobs).
- Log once, at the boundary where you decide. Interior layers preserve structure; they don’t decide policy.
Once you have good error enums, it’s useful to have a small derived view to drive policy and observability.
A minimal, practical version:
#[derive(Debug, Copy, Clone, Eq, PartialEq)]
pub enum ErrorClass {
InvalidInput,
NotFound,
Forbidden,
Temporary,
Internal,
}Then:
impl StorageError {
pub fn class(&self) -> ErrorClass {
match self {
StorageError::InvalidKey(_) => ErrorClass::InvalidInput,
StorageError::BackendUnavailable(_) => ErrorClass::Temporary,
StorageError::UnknownWriteStatus(_) => ErrorClass::Internal,
}
}
}At boundaries, this feeds:
- HTTP mapping (
ErrorClass::InvalidInput→ 400,NotFound→ 404,Temporary→ 503,Internal→ 500). - Retry logic (retry only some classes).
- Metrics (group by
ErrorClass).
If/when we need more nuance, we can split ErrorClass into three axes:
pub enum Blame {
Caller, // bad input, misuse of API
Domain, // valid input, but domain rules block it
Environment, // network, disk, other services
Bug, // our invariant is false
}
pub enum Transience {
Permanent, // retry won’t ever help
Retryable, // retry may help
Unknown,
}
pub enum Effect {
None, // definitely no side effects
Some, // definitely did something
Unknown, // maybe / don’t know
}We only add these where they drive real behavior (HTTP layer, job runner, etc.), and we let them live on canonical error enums:
impl StorageError {
pub fn blame(&self) -> Blame { /* match self */ }
pub fn transience(&self) -> Transience { /* match self */ }
pub fn effect(&self) -> Effect { /* match self */ }
}Rules of thumb:
- Don’t feel obligated to attach full geometry to every leaf type.
- Start with a single
ErrorClassif that’s enough. - Avoid
_ =>in matches so adding a new variant forces you to decide how to classify it.
Quick things to watch for in reviews:
pub enum Error {
Io(std::io::Error),
Http(reqwest::Error),
Db(sqlx::Error),
// ...
}Refactor:
- Split by capability:
StorageError,UserRepoError,MailerError,PaymentError. - Map those into a boundary type (
ApiError,JobError) only at the edge.
pub enum SendError {
Reqwest(reqwest::Error),
Smtp(smtp::Error),
}Refactor:
pub enum SendError {
Transport { endpoint: Url, #[source] source: reqwest::Error },
Protocol { detail: String, #[source] source: smtp::Error },
}Use sparingly and treat them like unwrap: allowed, but obvious debt.
- For each place that constructs
Other, ask: should this be a real variant? - If everything drifts into
Other, you’re not modelling your domain.
Refactor toward:
- inner layers: no logging, just structured errors,
- outer layer: one log entry with full context + error chain.
Refactor:
- “File missing”, “invalid JSON”, “host unreachable” etc. → proper error types.
- Panic only on internal invariants and impossible states.
Ask:
-
What capability is this part of? Is there a canonical error enum already (e.g.
StorageError)? Should there be? -
Who is the first real caller of this error? HTTP handler, job, CLI, other component?
-
What distinct behaviors do they need? Retry? Ask user to change input? Show “not found”? Crash?
-
Does each distinct behavior have a variant? If not, split or merge variants.
-
Can I express some preconditions in types instead of errors?
-
Am I leaking dependency details in the public surface? If yes, move those into
#[source]. -
Do I need internal error sets here, or is the canonical enum enough? Only reach for
OneOfif it simplifies real composition.
Skim for:
- Global “god” error enums.
- Enums named after libraries, not domain behaviors.
Other(String)doing too much work.- Logging from deep internals instead of boundaries.
- Panics/
unwrapreachable via user input or environment. Result<T, anyhow::Error>in capability traits (fine at very outer edges, not fine inside core).
Push toward:
- leaf error types where useful,
- one canonical error enum per real capability,
- internal error sets (
OneOf) only when they simplify composition, - classification only where it feeds real policy or observability.
- Types state what’s allowed.
- Traits carve capabilities at real joints.
- Errors admit where that story stops matching reality.
If we get them right:
- Behavior at boundaries (HTTP codes, retries, alerts) falls out of the types instead of being folklore.
- We stop saying “I/O error” and start saying “auth service at
auth‑01is down”. - “We don’t know if we charged the card” becomes a named variant instead of a cursed log line.
We don’t need a grand unified error framework.
We need:
- one honest capability error enum per capability,
- leaf error types where they make the story clearer,
OneOf‑style sets internally when they actually help compose things,- and, where it pays off, a small derived classification to drive policy and observability.
Design errors with the same care you put into types and traits.
They’re not leftovers from the happy path. They’re the other half of the map.