The automatic test of Martini comprises four components:
- a collection of tests with metadata;
- a collection of input files shared among the tests;
- a python and bash library of functions to ease the writing of tests;
- a main program that allows to list, select, and run the tests on various target computer, but also that compiles a global report from the individual report of each test.
A user must be able to run all or part of the test suite on a local or a remote computer. It must be able to overwrite some parameters, and to easily add new tests.
A user should also be able to run the tests in several times. For instance, a user should be able to run a first batch of tests, and if the results are good enough run an other batch with some other tests using the same inputs. By doing so, the report should be updated.
The tests are both validation and regression tests. Their outputs are compared with experimental values or values calculated from atomistic simulations to assess their validity. But the test outputs can also be compared to previously run tests to identify changes in behavior.
The tests are bash or python scripts that describe how to simulate and analyze a system of interest. Each test is contained in its own directory.
In addition to the scripts, a test directory should contain the input and reference files specific for the test. The directory should also contain a metadata file.
The metadata file is used by the main program to select the tests to run based on some user criteria, and to format the report. Therefore, the file must contain a name for the test, so as a description, and a series of tags. Tags are especially important as they will be used to select the tests. The metadata file is also used to identify what script to call to run and analyze the test. This allows to run and analyze the tests on different computers.
The scripts can be written in python or in bash. They should use the shared inputs as much as possible, but make as little assumption as possible about them. Indeed, the shared input files are what the user will be able to overwrite to adapt the tests to its use case. The provided library allows some standard manipulation of these input files.
The scripts should also be as independent as possible from the computer they will run on. Indeed, the scripts should be able to run on a local computer or on a cluster without modification. The library provides some abstraction layers to achieve that goal.
Input files that are not ultra-specific to a given test are grouped in that collection of input files. Among these files are the force field definition, the topology for most molecules, a template MDP file... These files are the main way a user can adapt the tests to its use case. For instance, when testing a new version of the force field definition, the corresponding file will be overwritten by the user. The same would happen when testing a new topology for a molecule, or a new set of MDP parameters.
What version of a file should be used is determined using a cascading scheme. For a given file, the version to use sorted by highest priority is:
- User provided version
- Scenario specific version
- Default version
The scenario specific version of files can be, for instance, versions of the MDP template for various versions of GROMACS.
A function in the library allows to get the correct version of a file based on that priority scheme.
The library makes available a collection of helper functions and abstractions.
The library serves 3 purposes:
- Give easy access to the underlying test architecture (get files from the shared input files)
- Factor some common tasks (build a box of solvent, calculate solvation free energies, calculate translocation and dimerization PMFs, ...)
- Abstract the difference between the computing environment (mostly how to run gromacs)
Here are some functions the library should provide:
get_shared_file(fname, [destination]): copy a file from the shared files directory, or from the user input directoryset_mdp_parameter(fname, key, value): modify a mdp file to set a parameter to a given value; the function should look if the parameter is already present in the file, and should be aware of gromacs naming normalization rules (case incensitive, dash == underscore, ...)solvation_free_energy(solute, solvent)partition_free_energy(solute, solvent1, solvent2)
Here are some environment variables the library should provide to hide the variability of how to run gromacs among gromacs versions (gmx mdrun vs mdrun) and among computers (prun vs mpirun vs nothing):
MDRUN: how to run mdrunGMX: how to run single core gromacs tools (all but mdrun)
Obviously, the library should be unit-tested.
The master program is what most user will interact with. It allows to run a single test or a selection of them.
The inputs are:
- The path to a directory containing the user defined input files (default none)
- The path to a directory where to store the result of the tests, if the directory already exists the tests will be updated (default pwd)
- Instructions about what computer to run on (default local)
- Tags to select tests
- Branch to consider in the test file tree
- Path to a directory containing test output to compare to
The test suite dependencies should be controlled.
- Gromacs
- martini tools (insane, martinize, martinate, backward, ...)
- Python (v2, and ideally soon v3)
- MDAnalysis
- Numpy/Scipy/Matplotlib
- As a tarball. It should include a script that test for the dependencies.
- As a docker image with all the dependencies included.
This is what I have in mind:
ALLATOM: A Library Leveraging Automated Testing Of Martini :p
This document provides a first (second) sketch of a test suite for the Martini Coarse Grain Force Field and the Martini Tools for setting up and running simulations. The suite is to provide a modular, comprehensive and flexible approach to manage testing of changes in any part of the force field, simulation setup or tools. In particular, the suite aims to
To this end, the suite lists a number of tests, which can be divided in four categories:
Each test consists of a protocol, a system, and a reference. The test runs the protocol on the system and compares the outcome to the reference. Protocols can be defined for general tasks, like running a simulation, or for specific tasks, like PMF calculations, partitioning, molecular sorting, and/or dynamics. In principle, a protocol consists of
Each stage of the protocol is monitored and deviations or errors are caught and presented to the user during run time. An error will cause the protocol to terminate.
The systems can be existing structures with associated topologies, or be generated according to set rules. They are classified according to the components they contain:
Each test is further associated with a level of physics that is tested, where a higher level typically means that the lower level physics are also tested (or testable). The levels of physics are:
Typically, these different levels relate to the simulation time needed to get the output required for the analysis and the comparison to reference data.
A test can have one or more references, which are categorized according to the source of the data:
For each (type of) reference the analysis needs to be specified, in terms of the manner of extracting the like variables from the coarse grain simulation, the statistical operations to perform on the outcomes and the reference and the criteria for evaluation, in particular the rules for classifying the results as equal or different.