Tsjerk/Automatic Test Of Martini: Design.md

## Automatic Test Of Martini: Design.md

      
    Raw
  

              Automatic Test Of Martini: Design.md
            
          
    Automatic Test Of Martini (title in progress)

The automatic test of Martini comprises four components:

a collection of tests with metadata;
a collection of input files shared among the tests;
a python and bash library of functions to ease the writing of tests;
a main program that allows to list, select, and run the tests on various target computer, but also that compiles a global report from the individual report of each test.

A user must be able to run all or part of the test suite on a local or a remote computer. It must be able to overwrite some parameters, and to easily add new tests.
A user should also be able to run the tests in several times. For instance, a user should be able to run a first batch of tests, and if the results are good enough run an other batch with some other tests using the same inputs. By doing so, the report should be updated.
The tests are both validation and regression tests. Their outputs are compared with experimental values or values calculated from atomistic simulations to assess their validity. But the test outputs can also be compared to previously run tests to identify changes in behavior.
A collection of tests

The tests are bash or python scripts that describe how to simulate and analyze a system of interest. Each test is contained in its own directory.
In addition to the scripts, a test directory should contain the input and reference files specific for the test. The directory should also contain a metadata file.
The metadata file is used by the main program to select the tests to run based on some user criteria, and to format the report. Therefore, the file must contain a name for the test, so as a description, and a series of tags. Tags are especially important as they will be used to select the tests. The metadata file is also used to identify what script to call to run and analyze the test. This allows to run and analyze the tests on different computers.
The scripts can be written in python or in bash. They should use the shared inputs as much as possible, but make as little assumption as possible about them. Indeed, the shared input files are what the user will be able to overwrite to adapt the tests to its use case. The provided library allows some standard manipulation of these input files.
The scripts should also be as independent as possible from the computer they will run on. Indeed, the scripts should be able to run on a local computer or on a cluster without modification. The library provides some abstraction layers to achieve that goal.
A collection of input files

Input files that are not ultra-specific to a given test are grouped in that collection of input files. Among these files are the force field definition, the topology for most molecules, a template MDP file... These files are the main way a user can adapt the tests to its use case. For instance, when testing a new version of the force field definition, the corresponding file will be overwritten by the user. The same would happen when testing a new topology for a molecule, or a new set of MDP parameters.
What version of a file should be used is determined using a cascading scheme. For a given file, the version to use sorted by highest priority is:

User provided version
Scenario specific version
Default version

The scenario specific version of files can be, for instance, versions of the MDP template for various versions of GROMACS.
A function in the library allows to get the correct version of a file based on that priority scheme.
A library of functions

The library makes available a collection of helper functions and abstractions.
The library serves 3 purposes:

Give easy access to the underlying test architecture (get files from the shared input files)
Factor some common tasks (build a box of solvent, calculate solvation free energies, calculate translocation and dimerization PMFs, ...)
Abstract the difference between the computing environment (mostly how to run gromacs)

Here are some functions the library should provide:

get_shared_file(fname, [destination]): copy a file from the shared files directory, or from the user input directory
set_mdp_parameter(fname, key, value): modify a mdp file to set a parameter to a given value; the function should look if the parameter is already present in the file, and should be aware of gromacs naming normalization rules (case incensitive, dash == underscore, ...)
solvation_free_energy(solute, solvent)
partition_free_energy(solute, solvent1, solvent2)

Here are some environment variables the library should provide to hide the variability of how to run gromacs among gromacs versions (gmx mdrun vs mdrun) and among computers (prun vs mpirun vs nothing):

MDRUN: how to run mdrun
GMX: how to run single core gromacs tools (all but mdrun)

Obviously, the library should be unit-tested.
A master program

The master program is what most user will interact with. It allows to run a single test or a selection of them.
The inputs are:

The path to a directory containing the user defined input files (default none)
The path to a directory where to store the result of the tests, if the directory already exists the tests will be updated (default pwd)
Instructions about what computer to run on (default local)
Tags to select tests
Branch to consider in the test file tree
Path to a directory containing test output to compare to

Distributing the tests

Dependencies

The test suite dependencies should be controlled.

Gromacs
martini tools (insane, martinize, martinate, backward, ...)
Python (v2, and ideally soon v3)
MDAnalysis
Numpy/Scipy/Matplotlib

Distribution


As a tarball. It should include a script that test for the dependencies.
As a docker image with all the dependencies included.
No results found