Created
September 24, 2025 19:15
-
-
Save cch1/5919f52cbca92d366c30d5003066f153 to your computer and use it in GitHub Desktop.
Why ensure schema as part of system startup?
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| I realize this is going to seem heretical, but here goes: there is value in ensuring schema as part of instance startup. At Sun Tribe, we've been doing it for years. | |
| The biggest advantage is the ease of binding the schema to the code. When combined with a testing strategy that stands up an in-memory database from scratch (many times, in fact, while running the test suite), it's easy to guarantee that the schema and code are exactly that which will be eventually ensured and run in production. It's also easy to travel back in time locally (schema and code) by starting the system on a previous commit, sometimes making forensics a little easier. | |
| I realize there are limitations with this approach. For example, long-running data corrections need to be carefully considered lest they block system startup for "too long". It's also important to have lifecycle hooks that can declare a startup failure and roll back a deploy (cough, cough... Datomic Ions). Otherwise a schema mismatch (really only possible if someone runs a migration outside the CD pipeline or modifies a historical migration somehow) causes a deploy failure and system downtime. I'm happy trading the slight increase in risk of system downtime (*only* because I don't have good lifecycle hooks!!) against the ease of guaranteeing code/schema compatibility* regardless of environment (local, CI/CD, production). | |
| In either approach (ensure schema then startup or ensure schema as part of startup) it's non-negotiable that the system must never _operate_ with a schema/code incompatibility*. One approach uses dev ops ensure that "ensure schema" succeeds before instance startup and the other uses Clojure code inside the system itself to do the same thing. With good lifecycle hooks, the space between these two approaches gets even smaller. | |
| *In this context, incompatibility means schema is older than the code. In either approach, accretive schema is critical to allowing rolling deploys. Otherwise scheduled downtime is the only solution. | |
| It's also vital in both approaches that migrations are verifiably immutable once deployed to production -using a tool like caribou gives you that guarantee (by comparing the cryptographic hash of the tree of applied migrations and their tx-data to the same named migrations "on disk"). Otherwise, what are you really ensuring? | |
| Sorry if this went on too long -I've been pilloried for publicly espousing this approach before and I'm probably being too pre-defensive. | |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment