Skip to content

Instantly share code, notes, and snippets.

@sanmiguel
Last active October 21, 2016 09:39
Show Gist options
  • Select an option

  • Save sanmiguel/90f99e82188e974f25e83ceb127f0d66 to your computer and use it in GitHub Desktop.

Select an option

Save sanmiguel/90f99e82188e974f25e83ceb127f0d66 to your computer and use it in GitHub Desktop.

The path to one-ness with Universe

In order to gain access to Universe, we must fulfil the requirements laid out in the currently open PR: d2iq-archive/universe#703

To achieve this, we must make some breaking changes to the Riak-Mesos Framework. Those changes, and the reasoning behind them, are laid out below.

For a long time we have discussed the path forward as being to a split-package world, where we maintain separate packaging for DC/OS for KV and TS: this has seemed like the appropriate way for us to continue for some time but upon reflection I feel this is not how we should proceed.

In the light of the revelation that we may not now, nor ever, delete a package from Universe, the path forward is much clearer: we must maintain a single package - the one which already exists.

One thing that drove us to want separate packages for KV and TS was the difficulty (impossibility) of running simultaneous KV and TS clusters under a single framework instance. The changes below outline how we rid ourselves of this restriction.

Official relocatable packages: a caveat

It's a noteworthy caveat that our use of relocatable packages is restricted to

  • TS 1.4.0+
  • KV 2.2.0+

They have not been produced for any versions prior to those (there are still no released packages for OSS KV - only EE RC packages).

Configuration

Currently we only configure a single Riak version in an RMF instance. We need to upgrade this to include a map of configured Riak versions: by default we will include the latest supported version of TS and KV. The configuration for these versions will be an optional field in the config.json file, at .riak.node.uris. This same value can then be copied into the resources.json file in the DCOS Universe package definition.

This requires a slightly different approach between DCOS and pure-Mesos configuration:

in DCOS resources.json, we will specify .assets.uris with the same value as one would use in config.json at .riak.node.uris.

With this in place, in DCOS marathon.json.mustache we will need to (ab)use templating to build the list of uris in the appropriate format to tell marathon what to download. This might prove difficult.

We will also need to build the appropriate env var map of (version -> uri) to give to the Scheduler.

Example:

DCOS resources.json:

{
    "assets": {
        "uris": {
            "scheduler": "https://github.com/basho-labs/riak-mesos-scheduler/releases/download/1.8.1/riak_mesos_scheduler-1.8.1-mesos-1.0.0-ubuntu-14.04.tar.gz",
            "director": "https://github.com/basho-labs/riak-mesos-director/releases/download/1.0.1/riak_mesos_director-1.0.1-ubuntu-14.04.tar.gz",
            "executor": "https://github.com/basho-labs/riak-mesos-executor/releases/download/1.7.0/riak_mesos_executor-1.7.0-mesos-1.0.0-ubuntu-14.04.tar.gz",
            "explorer": "https://github.com/basho-labs/riak_explorer/releases/download/1.2.1/riak_explorer-1.2.1.patch-ubuntu-14.04.tar.gz",
            "patches": "https://github.com/basho-labs/riak-mesos-executor/releases/download/1.7.0/riak_erlpmd_patches-1.7.0-mesos-1.0.0-ubuntu-14.04.tar.gz",
            "riak-kv-2.1.4": "https://github.com/basho-labs/riak-mesos/releases/download/1.1.0/riak-2.1.4-ubuntu-14.04.tar.gz",
            "riak-ts-1.3.1": "https://github.com/basho-labs/riak-mesos/releases/download/1.1.0/riak_ts-1.3.1-ubuntu-14.04.tar.gz"
        }
    },
    ...
}

DCOS marathon.json.mustache:

{
    ...
    "fetch": [
        {"uri": "{{resource.assets.uris.scheduler}}"},
        {"uri": "{{resource.assets.uris.director}}", "extract": false},
        {"uri": "{{resource.assets.uris.executor}}", "extract": false},
        {"uri": "{{resource.assets.uris.explorer}}","extract": false},
        {"uri": "{{resource.assets.uris.patches}}", "extract": false},
        {"uri": "{{resource.assets.uris.riak-ts-1.3.1}}", "extract": false},
        {"uri": "{{resource.assets.uris.riak-kv-2.1.4}}", "extract": false}
    ],
    ...
    "env": {
        ...
        "RIAK_MESOS_ASSETS": "{{resource.assets.uris}}",
        ...
    },
    ...
}

TODO: Perhaps we should consider changing config.json to expect a .resources.assets section too? That might reduce some complexity...

DCOS config.json:

TODO: How to define a jsonschema that allows ANY key in the "uris" object?

Tools config.json:

{
    "riak": {
        ...
        "node": {
            "uris": {
                "riak-kv-2.1.4": "https://github.com/basho-labs/riak-mesos/releases/download/1.1.0/riak-2.1.4-ubuntu-14.04.tar.gz",
                "riak-ts-1.3.1": "https://github.com/basho-labs/riak-mesos/releases/download/1.1.0/riak_ts-1.3.1-ubuntu-14.04.tar.gz"
            },
            ...
    }
}

The CLI tools will need to be changed to pass through these sensibly. TODO Since the Scheduler must know how to deal with the blunt-force methods of DCOS, perhaps we should just do the same thing for consistency?

Scheduler

Caveat 1: never use outputFile in marathon.json: if we do, we'll need to change the config in a way that is probably incompatible with DCOS resources.json

rms_config:artifacts() -> all configured riak versions full uri, grab filename from end of each uri

Environment variables:

Currently we set a bunch of env vars e.g. RIAK_MESOS_EXECUTOR_PKG which contain just the name of the package. The scheduler then uses this filename to serve the artifacts to executors via HTTP.

We need to change this to give us the map of (version -> uri) from the configuration.

Tooling extensions

To support this, we'll need to change how we define clusters and interact with them via the CLI.

We'll need a new command, and changes to a couple of existing ones:

$ riak-mesos list-versions
riak-kv-2.1.4
riak-ts-1.3.1

$ riak-mesos list-versions --json
# Prints the full version/URI configured, as a valid JSON blob
{
    "riak-kv-2.1.4": "https://github.com/basho-labs/riak-mesos/releases/download/1.1.0/riak-2.1.4-ubuntu-14.04.tar.gz",
    "riak-ts-1.3.1": "https://github.com/basho-labs/riak-mesos/releases/download/1.1.0/riak_ts-1.3.1-ubuntu-14.04.tar.gz"
}
$ riak-mesos cluster create my-kv riak-kv-2.1.4
{"success": "true"}

$ riak-mesos cluster destroy my-kv
{"success": "true"}

$ riak-mesos cluster add-node my-kv
{"success": "true"}

$ riak-mesos cluster add-node my-kv --nodes 3
{"success": "true"}
{"success": "true"}
{"success": "true"}

The dirty secret

There's one thing that we need to address before we can go down this road: patches and explorer artifacts, and the paths the extract to.

We've suffered in trying to support both our format of custom-built riak and the relocatable rel packages now coming into the official buildchain. To summarise:

We inject 2 sets of patches into the Riak dir when creating an Executor:

  • riak_explorer
  • erlpmd patches (to allow using our customised EPMD setup)

In order to correctly inject these into Riak's env, we rely on Mesos to extract them to the correct path, but it does not support (even in v1.0.1) customising this path at runtime - the archive must contain the correct path beforehand.

Solution: we stop Mesos extracting these, and have the Executor do so - this way we can control exactly where it goes to. We can allow Mesos to extract Riak - that will give us an immediate way to determine where the appropriate path to extract the archives to is. To achieve this, we'll change our explorer and patches artifacts to contain simply riak/lib paths - then we can extract that directly into the Riak dir inside the Executor. Simples!

[NB: We already have functionality in the Scheduler to derive the path to Riak inside the supplied Riak archive that gets sent to the Executor for this purpose.]

Summary of Changes

Artifacts

  • Stop using our own special snowflake Riak archives
  • Use only official relocatable builds (which have ./riak/... paths)

Tools

  • Deprecate .riak.*.url and move all to a single .riak.uris entry
  • Add multiple versions of riak to .riak.uris (to start, by default, KV 2.2.0 and TS 1.4.0)
  • Pass the .riak.uris values both to Marathon and to the Scheduler (via env var)
  • TODO Find an appropriate format to give this map to the Scheduler in - has to be achievable from the DCOS marathon.json.mustache file
  • Add list-versions command to print the configured Riak versions
  • Alter cluster commands to require cluster name, Riak version

A note about deprecation: we should, for a while, support the old configuration but print a brief-yet-informative blurb with a link for more detail on the matter, whenever it is detected.

Scheduler

  • Pick up the URIs from the Env as per Tools above
  • No framework-wide Riak version: move to per-cluster
  • Don't expect RIAK_MESOS_*_PKG vars to tell us package names: figure it out from the URIs provided in env

Executor

  • No longer expect explorer or patches archives to be extracted by Mesos by default: add steps for manual extraction prior to starting Riak node
@mdigan
Copy link

mdigan commented Oct 20, 2016

@sanmiguel what would the output of riak-mesos list-versions and what would the tools config.json look like as more and more versions of KV and TS are published? Would they include every published version? For example, I assume after KV 2.2 and TS 1.5 are released and RMF is updated appropriately to include all available versions, riak-mesos list-versions would return:

riak-kv-2.1.4
riak-kv-2.2.0
riak-ts-1.3.1
riak-ts-1.4.0
riak-ts-1.5.0

And I assume the tools config.json would include lines for the same.

cc @ph07

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment