Skip to content

Instantly share code, notes, and snippets.

@corygabrielsen
Created September 9, 2021 22:37
Show Gist options
  • Select an option

  • Save corygabrielsen/71a20df7c84e54d2b8b7165e11850f54 to your computer and use it in GitHub Desktop.

Select an option

Save corygabrielsen/71a20df7c84e54d2b8b7165e11850f54 to your computer and use it in GitHub Desktop.
Ethereum ETL

Goal

We aim to design a system for storing and querying Ethereum data via ETL techniques.

Tenets

The system should be:

  • containerized
  • SQL
  • idempotent
  • portable to move data between hosts

Tools / Tech

Docker docker-compose PostgreSQL

Docker / Entrypoint

Docker will can rebuild a raw db

PostgreSQL data can be stored in a directory that is volume-mounted in the docker. Then the directory + code repo can move between hosts.

Will then need a second sister docker for running the scripts that load the db.

Applications will live in other dockers elsewhere possibly on other hosts.

Data Operations

Create

Easiest to build database as a linear sequence of steps so we can rewind/rebuild as needed.

Will need at least one main metadata table that is just meta info about what the state of the db loading data.

Load

But we need to only deal with a portion of Ethereum data at once due to size.

Online

Another application will be loading data live into the db as it tracks the Ethereum node.

This operation needs to be able to gracefully work with the metadata table to wait for its turn, start at the right place, catch up, and stay caught up.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment