Skip to content

Instantly share code, notes, and snippets.

@jbarnoud
Last active November 25, 2016 08:04
Show Gist options
  • Select an option

  • Save jbarnoud/db4be0cd9c6bfc734a0b84eb59b1f157 to your computer and use it in GitHub Desktop.

Select an option

Save jbarnoud/db4be0cd9c6bfc734a0b84eb59b1f157 to your computer and use it in GitHub Desktop.
Automatic test of Martini

Automatic Test Of Martini (title in progress)

The automatic test of Martini comprises four components:

  • a collection of tests with metadata;
  • a collection of input files shared among the tests;
  • a python and bash library of functions to ease the writing of tests;
  • a main program that allows to list, select, and run the tests on various target computer, but also that compiles a global report from the individual report of each test.

A user must be able to run all or part of the test suite on a local or a remote computer. It must be able to overwrite some parameters, and to easily add new tests.

A user should also be able to run the tests in several times. For instance, a user should be able to run a first batch of tests, and if the results are good enough run an other batch with some other tests using the same inputs. By doing so, the report should be updated.

The tests are both validation and regression tests. Their outputs are compared with experimental values or values calculated from atomistic simulations to assess their validity. But the test outputs can also be compared to previously run tests to identify changes in behavior.

A collection of tests

The tests are bash or python scripts that describe how to simulate and analyze a system of interest. Each test is contained in its own directory.

In addition to the scripts, a test directory should contain the input and reference files specific for the test. The directory should also contain a metadata file.

The metadata file is used by the main program to select the tests to run based on some user criteria, and to format the report. Therefore, the file must contain a name for the test, so as a description, and a series of tags. Tags are especially important as they will be used to select the tests. The metadata file is also used to identify what script to call to run and analyze the test. This allows to run and analyze the tests on different computers.

The scripts can be written in python or in bash. They should use the shared inputs as much as possible, but make as little assumption as possible about them. Indeed, the shared input files are what the user will be able to overwrite to adapt the tests to its use case. The provided library allows some standard manipulation of these input files.

The scripts should also be as independent as possible from the computer they will run on. Indeed, the scripts should be able to run on a local computer or on a cluster without modification. The library provides some abstraction layers to achieve that goal.

A collection of input files

Input files that are not ultra-specific to a given test are grouped in that collection of input files. Among these files are the force field definition, the topology for most molecules, a template MDP file... These files are the main way a user can adapt the tests to its use case. For instance, when testing a new version of the force field definition, the corresponding file will be overwritten by the user. The same would happen when testing a new topology for a molecule, or a new set of MDP parameters.

What version of a file should be used is determined using a cascading scheme. For a given file, the version to use sorted by highest priority is:

  • User provided version
  • Scenario specific version
  • Default version

The scenario specific version of files can be, for instance, versions of the MDP template for various versions of GROMACS.

A function in the library allows to get the correct version of a file based on that priority scheme.

A library of functions

The library makes available a collection of helper functions and abstractions.

The library serves 3 purposes:

  • Give easy access to the underlying test architecture (get files from the shared input files)
  • Factor some common tasks (build a box of solvent, calculate solvation free energies, calculate translocation and dimerization PMFs, ...)
  • Abstract the difference between the computing environment (mostly how to run gromacs)

Here are some functions the library should provide:

  • get_shared_file(fname, [destination]): copy a file from the shared files directory, or from the user input directory
  • set_mdp_parameter(fname, key, value): modify a mdp file to set a parameter to a given value; the function should look if the parameter is already present in the file, and should be aware of gromacs naming normalization rules (case incensitive, dash == underscore, ...)
  • solvation_free_energy(solute, solvent)
  • partition_free_energy(solute, solvent1, solvent2)

Here are some environment variables the library should provide to hide the variability of how to run gromacs among gromacs versions (gmx mdrun vs mdrun) and among computers (prun vs mpirun vs nothing):

  • MDRUN: how to run mdrun
  • GMX: how to run single core gromacs tools (all but mdrun)

Obviously, the library should be unit-tested.

A master program

The master program is what most user will interact with. It allows to run a single test or a selection of them.

The inputs are:

  • The path to a directory containing the user defined input files (default none)
  • The path to a directory where to store the result of the tests, if the directory already exists the tests will be updated (default pwd)
  • Instructions about what computer to run on (default local)
  • Tags to select tests
  • Branch to consider in the test file tree
  • Path to a directory containing test output to compare to

Distributing the tests

Dependencies

The test suite dependencies should be controlled.

  • Gromacs
  • martini tools (insane, martinize, martinate, backward, ...)
  • Python (v2, and ideally soon v3)
  • MDAnalysis
  • Numpy/Scipy/Matplotlib

Distribution

  • As a tarball. It should include a script that test for the dependencies.
  • As a docker image with all the dependencies included.
@Tsjerk
Copy link

Tsjerk commented Nov 23, 2016

This is what I have in mind:


ALLATOM: A Library Leveraging Automated Testing Of Martini :p

This document provides a first (second) sketch of a test suite for the Martini Coarse Grain Force Field and the Martini Tools for setting up and running simulations. The suite is to provide a modular, comprehensive and flexible approach to manage testing of changes in any part of the force field, simulation setup or tools. In particular, the suite aims to

  • assert integrity of results upon changes in parameters or protocols (regression/validation)
  • assess robustness of results with respect to variations in parameters, libraries and architectures
  • assess the effects of changes in parameter to allow mapping of parameter spaces for optimization
  • identify improvements or deteriorations to facilitate force field development

To this end, the suite lists a number of tests, which can be divided in four categories:

  • Beads: Tests to assess the effects of changes on the bead level with respect to interactions, partitioning, etc.
  • Blocks: Tests to assess the effects of changes on the bead, mapping or topology level on the behaviour of molecules or building blocks.
  • Simple assemblies: Tests to assess the effects of changes on bead, mapping, topology or simulation level on the behaviour of well-defined model systems, such as one-, two- or three component membranes.
  • Complex assemblies: Tests to assess the effects of changes on bead, mapping, topology or simulation level on the behaviour of complex systems, e.g. macroscopic and/or biological systems.

Each test consists of a protocol, a system, and a reference. The test runs the protocol on the system and compares the outcome to the reference. Protocols can be defined for general tasks, like running a simulation, or for specific tasks, like PMF calculations, partitioning, molecular sorting, and/or dynamics. In principle, a protocol consists of

  1. Force field generation
  2. Topology assembly
  3. Simulation setup
  4. Run parameters
  5. Simulation
  6. First line analysis
  7. Comparison to reference data

Each stage of the protocol is monitored and deviations or errors are caught and presented to the user during run time. An error will cause the protocol to terminate.

The systems can be existing structures with associated topologies, or be generated according to set rules. They are classified according to the components they contain:

  • Solvents
  • Ions
  • Solutes
  • Lipids
  • Protein
  • Nucleic acids
  • Carbohydrate
  • Other polymers
  • Other

Each test is further associated with a level of physics that is tested, where a higher level typically means that the lower level physics are also tested (or testable). The levels of physics are:

  1. Stability
  2. Structural properties
  3. Dynamical properties
  4. Thermodynamic properties

Typically, these different levels relate to the simulation time needed to get the output required for the analysis and the comparison to reference data.

A test can have one or more references, which are categorized according to the source of the data:

  • Coarse grained simulations
  • Atomistic simulations
  • Experimental data

For each (type of) reference the analysis needs to be specified, in terms of the manner of extracting the like variables from the coarse grain simulation, the statistical operations to perform on the outcomes and the reference and the criteria for evaluation, in particular the rules for classifying the results as equal or different.

@jbarnoud
Copy link
Author

Your comment addresses a different level of concerns, and I see nothing clashing between our two views. The main question I see is how to fit your categories in my scheme. I define two ways to classify tests: through the file tree, and through tags. You define 3 levels of tests, but there isn't really a hierarchy between them.

All your levels can be specified by tags. A test involving the dimerization free energy of amino acid side chains could ten be tagged "blocks, protein, thermodynamics", in addition to more add-hoc tags like side chain analogues, and PMF.

Your levels could also be defined from the file tree. Then the test above would sit in "blocks/protein/thermodynamics". Though, what should we do with mixed systems?

A last option is to mix the tags and the file tree. I would have the test in "protein/side-chains", and tagged "block, thermodynamics, PMF". I would go for that last option; especially as we can consider the subdirectory names as tags for selection purpose.

Regarding the components of a test, I would not have the force field generation as part of the test but the force field definition. Indeed, requiring the generation procedure means that the user needs to adapt martini.py to test changes in martini.itp. My experience is that you have a tweaked version of the ITP before you have a tweaked version of the script: it is how Paulo did, it also is how I suggested Liu Yang to do when I asked him to use custom bead types for a test. Of course, we could generate martini.itp using martini.py is the ITP is not already available; doing so would require a helper function so we do not have to think about it when writing tests.

Finally, something I barely addressed in my previous "comment", tests should say something about how they eded up. A test should be able to tell if the user have to look at it or if everything is fine. This would avoid the user to go through all the test results if nothing changed. I see several possible "exit code":

  • failed with an error (the test crashed)
  • match reference atomistic/experimental value (do not worry about the test)
  • does not match reference atomistic/experimental value (hey your parameters suck!)

Comparing with how the reference force field was performing the test would be the job of the main program.

@jbarnoud
Copy link
Author

ALLATOM: A Library Leveraging Automated Testing Of Martini

The suite is to provide a modular, comprehensive and flexible approach to manage testing of changes in any part of the force field, simulation setup or tools. In particular, the suite aims to

  • assert integrity of results upon changes in parameters or protocols (regression/validation)
  • assess robustness of results with respect to variations in parameters, libraries and architectures
  • assess the effects of changes in parameter to allow mapping of parameter spaces for optimisation
  • identify improvements or deteriorations to facilitate force field development

To this end, the suite lists a number of tests, which can be divided in four categories:

  • Beads: Tests to assess the effects of changes on the bead level with respect to interactions, partitioning, etc.
  • Blocks: Tests to assess the effects of changes on the bead, mapping or topology level on the behaviour of molecules or building blocks.
  • Simple assemblies: Tests to assess the effects of changes on bead, mapping, topology or simulation level on the behaviour of well-defined model systems, such as one-, two- or three component membranes.
  • Complex assemblies: Tests to assess the effects of changes on bead, mapping, topology or simulation level on the behaviour of complex systems, e.g. macroscopic and/or biological systems.

Each test consists of a protocol, a system, and a reference. The test runs the protocol on the system and compares the outcome to the reference. Protocols can be defined for general tasks, like running a simulation, or for specific tasks, like PMF calculations, partitioning, molecular sorting, and/or dynamics. In principle, a protocol consists of

  1. Force field definition
  2. Topology assembly
  3. Simulation setup
  4. Run parameters
  5. Simulation
  6. First line analysis
  7. Comparison to reference data

Each stage of the protocol is monitored and deviations or errors are caught and presented to the user during run time. An error will cause the protocol to terminate.

The systems can be existing structures with associated topologies, or be generated according to set rules. They are classified according to the components they contain:

  • Solvents
  • Ions
  • Solutes
  • Lipids
  • Protein
  • Nucleic acids
  • Carbohydrate
  • Other polymers
  • Other

Each test is further associated with a level of physics that is tested, where a higher level typically means that the lower level physics are also tested (or testable). The levels of physics are:

  1. Stability
  2. Structural properties
  3. Dynamical properties
  4. Thermodynamic properties

Typically, these different levels relate to the simulation time needed to get the output required for the analysis and the comparison to reference data.

A test can have one or more references, which are categorised according to the source of the data:

  • Experimental data
  • Atomistic simulations
  • Coarse grained simulations

For each (type of) reference the analysis needs to be specified, in terms of the manner of extracting the like variables from the coarse grain simulation, the statistical operations to perform on the outcomes and the reference and the criteria for evaluation, in particular the rules for classifying the results as equal or different.

Practical matters

The automatic test of Martini comprises four components:

  • a collection of tests with metadata;
  • a collection of input files shared among the tests;
  • a python and bash library of functions to ease the writing of tests;
  • a main program that allows to list, select, and run the tests on various target computer, but also that compiles a global report from the individual report of each test.

A user must be able to run all or part of the test suite on a local or a remote computer. It must be able to overwrite some parameters, and to easily add new tests.

A user should also be able to run the tests in several runs. For instance, a user should be able to run a first batch of tests, and if the results are good enough run an other batch with some other tests using the same inputs. By doing so, the report should be updated.

The tests are both validation and regression tests. Their outputs are compared with experimental values or values calculated from atomistic simulations to assess their validity. But the test outputs can also be compared to previously run tests to identify changes in behaviour.

Finally, a user should be able to overwrite any part of the test suite, and to add its own tests, without altering any file we provide.

A collection of tests

The tests are bash or python scripts that describe how to simulate and analyse a system of interest. Each test is contained in its own directory.

In addition to the scripts, a test directory should contain the input and reference files specific for the test. The directory should also contain a metadata file.

The metadata file is used by the main program to select the tests to run based on some user criteria, and to format the report. Therefore, the file must contain a name for the test, so as a description, and a series of tags. Tags are especially important as they will be used to select the tests. The metadata file is also used to identify what scripts to call to run and analyse the test.

The scripts can be written in python or in bash. They should use the shared inputs as much as possible, but make as little assumption as possible about them. Indeed, the shared input files are what the user will be able to overwrite to adapt the tests to its use case. The provided library allows some standard manipulation of these input files.

The scripts should also be as independent as possible from the computer they will run on. Indeed, the scripts should be able to run on a local computer or on a cluster without modification. The library provides some abstraction layers to achieve that goal.

Each test should output a summary of its results in the most pertinent human readable way (table, plot, value). The tests should also output a status. The status indicates if the tests went through of crashed, it also indicates how it compares to the reference (good enough, or not). That status will be displayed in the final report, allowing the user to focus on the tests that require the most attention.

Tests should be organised based on what component of the force field they are testing, and the level of organisation involved (beads, blocks, simple assemblies, complex assemblies). Special entries in the metadata file will allow to specify the level of physics tested, and the expected duration of the test. Tags can be provided to give more selection options.

A collection of input files

Input files that are not ultra-specific to a given test are grouped in that collection of input files. Among these files are the force field definition, the topology for most molecules, a template MDP file... These files are the main way a user can adapt the tests to its use case. For instance, when testing a new version of the force field definition, the corresponding file will be overwritten by the user. The same would happen when testing a new topology for a molecule, or a new set of MDP parameters.

What version of a file should be used is determined using a cascading scheme. For a given file, the version to use sorted by highest priority is:

  • User provided version
  • Scenario specific version
  • Default version

The scenario specific version of files can be, for instance, versions of the MDP template for various versions of GROMACS.

A function in the library allows to get the correct version of a file based on that priority scheme.

Among the input files, the force field definition has a peculiar status as it can be provided in various way:

  • changes in the interaction table can be provided as a table that will be converted in an ITP file
  • the force field definition can also be provided directly as an ITP that will be used as it
  • the smallest changes can be provided as a patch to the ITP file, or to the file describing the interaction table

A library of functions

The library makes available a collection of helper functions and abstractions.

The library serves 3 purposes:

  • Give easy access to the underlying test architecture (get files from the shared input files)
  • Factor some common tasks (build a box of solvent, calculate solvation free energies, calculate translocation and dimerisation PMFs, ...)
  • Abstract the difference between the computing environment (mostly how to run gromacs)

Here are some functions the library should provide:

  • get_shared_file(fname, [destination]): copy a file from the shared files directory, or from the user input directory
  • set_mdp_parameter(fname, key, value): modify a mdp file to set a parameter to a given value; the function should look if the parameter is already present in the file, and should be aware of gromacs naming normalisation rules (case insensitive, dash == underscore, ...)
  • solvation_free_energy(solute, solvent)
  • partition_free_energy(solute, solvent1, solvent2)

Here are some environment variables the library should provide to hide the variability of how to run gromacs among gromacs versions (gmx mdrun vs mdrun) and among computers (prun vs mpirun vs nothing):

  • MDRUN: how to run mdrun
  • GMX: how to run single core gromacs tools (all but mdrun)

Obviously, the library should be unit-tested.

A master program

The master program is what most user will interact with. It allows to run a single test or a selection of them.

The inputs are:

  • The path to a directory containing the user defined input files (default none)
  • The path to a directory where to store the result of the tests, if the directory already exists the tests will be updated (default pwd)
  • The path to a directory with user defined tests (default none)
  • Instructions about what computer to run on (default local)
  • Tags to select tests
  • Branch to consider in the test file tree
  • Path to a directory containing test output to compare to (default none)

When called, the master program will:

  • make sure the environment is sane in order to fail as early as possible if files or dependencies are missing
  • discover from the collection of tests the ones corresponding to the user selection
  • wrap the tests in the most appropriate manner for the target computer
  • run the tests
  • edit a report

Distributing the tests

Dependencies

The test suite dependencies should be controlled.

  • Gromacs
  • martini tools (insane, martinize, martinate, backward, ...)
  • Python (v2, and ideally soon v3)
  • MDAnalysis
  • Numpy/Scipy/Matplotlib

Distribution

  • As a tarball. It should include a script that tests for the dependencies.
  • As a docker image with all the dependencies included.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment