- leverage the type system
- move constraints upstream
- parse don't validate
- make invalid states unrepresentable
- type-driven dev
- hexagonal architecture of sorts
- DTOs / DAOs
This is a collection of ideas and concepts gathered during the last 8 years practicing type-driven development in scala, haskell and rust. Of course, those languages have different featuresets and thus different strengths, but they share a core (sum types, no nulls, traits/typeclasses) that gives rise to patterns that work across language boundaries.
Most of this is common knowledge in haskell, which has embraced type-safety for whole-program semantics and has been around for quite a bit of time now. This common knowledge / wisdom can be loosely gathered under the term "Type-driven development" or "Type-directed developmenent". In addition to common patterns and concepts, I'll try to expand a bit more on concrete examples and use-cases.
Depending on your programming experience, you might have different mental models for type systems. In most cases, people who started with imperative programming see types as part of the compilation process. Types tell the compiler how things are stored in memory. They tell what things are.
This an accurate definition, but that's not the whole story. Seeing things this way deprives you from benefiting from the whole power of types.
A type system is a logical system. It lets you define properties: a set of rules that are checked statically during compilation. Statically means here that they will hold for every execution of the program, no matter what.
For the mathematically inclined, type properties are universally quantified (they hold for every value of a given type), in contrast with properties covered by tests, which are existentially quantified (they only hold for the values that have been tested). If you are interested in the relation between logic and types, I suggest reading up on the [Curry-Howard isomorphism][curry-howard-isomorphism].
This is extremely useful because it allows us to eschew defensive programming by ensuring that certain rules hold for the duration of the program, no matter what. This does not let us completely remove error handling because after all, the program will still act on dynamic data, but this will let us strategically place error handling gates where we want.
We can think of it as security checks in airports (without the racist bias, the fondling and the privacy invasion): once we have crossed security, we are free to go wherever we want in a convenient way.
Working with a type system is a bit like designing airport security, with a way bigger design space.
The gist of type-directed development is to find a way to tell the type-checker the properties we care about, so that it can help us ensure they hold everywhere. A good type-driven session usually start with thinking a bit to come up with a way to explain things to the compiler, followed by a more relaxed session where the compiler has enough information to guide us. This second phase often carries some bits of creative work, but they are done within an already established context, and as such are way easier than initial design.
Type-driven programming practitioners often say programming sessions are a conversation with the compiler: you start by telling the compiler about your problem space by defining types (function types or data types), and then it's a back-and-forth between programmer and compiler until a correct solution emerges. The compiler is your friend, not your ennemy. Try to leave your old "the compiler is a hurdle" mindset behind you.
Another misconception about types is thinking using them is about describing the reality. Types are there to let you enforce properties on your codebase, not to be a thorough description of everything. So types should always used as a tool to make systems safer and convenient to work with. If you don't need a property, don't encode it in types just for the sake of it. No, dynamic type systems are not inherently more open covers it in more detail.
A good rule is to push constraints upwards when possible: instead of having a function return possible errors, try to restrict its arguments so that it can never fail. This is not always possible, because it is not always possible to faithfully encode preconditions in a type, but it is still something we should strive to do, because it minimises the amount of work required for error handling.
A common phrasing in haskell is make invalid states unrepresentable: parse things once, and then produce a well-formed value that does not need further error handling. This is extremely well encapsulated in Parse, don't validate.
In rust, enums are a great way to pull that off: they let you precisely encode properties directly in the data type, removing the need for error handling.
Another great tool for properties that can't be structurally encoded in the rust typesystem is single-field structs, sometimes called newtypes: they don't have any overhead in the generated code since the actual in memory representation is the same, but they introduce a new, distinct type, as far as the type checker is concerned. For more info, read Embrace the newtype pattern.
What it means for day-to-day code:
Use static type to ensure static properties. Making code dynamic (lose static properties / be less type-safe) for reusability is usually a bad tradeoff. Strict DRYness is not a goal in itself and boilerplate can be mananged by the compiler (or at least it can help manage it). There are a number of ways the compiler can help deduplicating code without losing type-safety (generics, traits, macros).
For the specific case of data access, not only static data access pattern should be encoded in types, but dynamic data access patterns should be discouraged:
- data access patterns (how many DB requests are emitted, how intensive these requests are) are the most important contributor to performance characteristics and even architecture viability. Predictable data access patterns are paramount to a well-behaving system. As pythonistas say, explicit is better than implicit, this is especially true when it comes to talking to the database (more generally to external systems). Latency numbers are orders of magnitude higher for these and trump any internal performance metrics.
- working with inconsistently hydrated data objects requires a lot of error handling. Gracefully handling these situations is usually too costly to be done, so most of the time, only coarse-grained error handling is done, usually making the whole request crash when there are unhydrated fields.
When dealing with untyped boundaries, you're at the edge of your system, where the type system can no longer help you. Something that works well here, is to find a way to describe the expected properties of this untyped boundary to the type checker. A common way to do that is through the use of DAOs or DTOs. DAOs encode the stucture of database tuples to the type-checker. DTOs encode the structure of JSON payloads to the type-checker. By introducing types that mirror the boundary constraints, you are bringing the type-checker as close as possible to the untyped boundary, thus improving its reach. Concretely, this means that the hard part is to design data types that match this boundary, and to make sure they stay in sync. This is why simple DxOs that mirror things exactly are important. They should be easy to audit and understand (possibly using automated derivation mechanisms). Once this is done, converting from your model to the DxOs falls under the all-seeing eye of the type-checker. The alternative is to write serialization code manually, with a way more limited assistance from the compiler. Manual serialization code is not too hard to manage in scala or haskell, but in rust with serde it's extremely boilerplate-y.
In those modules, it is of paramount importance to ensure there are no dependencies to other parts of the codebase: these dao / dto modules are used as a proxy to declare the shape of the boundary objects. The correspondence between those types and the actual boundary cannot be checked by the compiler and has to be maintained manually. So this correspondence has to be as obvious as possible.