H5MD - proposal 103: Connectivity group

status: Accepted in H5MD 1.1

Objective

Define a root group that contains the connectivity of a system of particles. This proposal builds on proposal 101 Particles and tuples lists for the storage of tuples. The interpretation of the information will be left to users and can be made more formal later with modules, for instance.

This proposal is intended to allow, among other uses, the storage of bonded interactions, in the case of force-field based simulations, or of distance constraints. The complexity of representing the details of force fields is such that it cannot be encoded in a generic manner here but the definition of the storage elements in this proposal lay a common basis.

Motivations

http://article.gmane.org/gmane.science.simulation.h5md.user/604

The group I work with does a fair amount of analysis on bond lengths, angle
distributions and dihedral angle distributions as validation for force field
development. So our main use case for a 'topology' module would be to store
the pairs, 3- and 4-tuples of atoms that define the bonds, angles and
dihedrals.

That said, I have use cases for wanting to store lists of non-bonded pairs
of atoms such as opposing carbons on a ring or designated 'endpoint' atoms
that can be used to represent the overall orientation of a larger molecule,
and in this use case a specific 'bond' list would not really be appropriate.

(...)

I would definitely agree that a standard storage scheme, would be useful,
even if the structure of a 'topology' section varies by module. That way,
even if the location of the lists varies, at least the same routines can be
used to read/write the data when it is located. As a possible example:

    <tuple-list>
      +-- type:  String
      +-- dimension:  Integer[]
      \--  values: Integer[n-tuples][D]

Where a pair list would be dimension(D) = 2, and the values in list would be
the particle IDs. Given what Konrad pointed out about flexibility
vs. specificity ,the "type" string could have specific list of acceptable
values, much like the boundary attribute in the 'box' item. For my
particular use cases, I can see the following types of atom-tuple lists
being useful as topology/connectivity information: bonds, angles, dihedrals,
chains, and 'other'.

http://article.gmane.org/gmane.science.simulation.h5md.user/637

A specific problem of 2) is that for non-trivial forcefields (proteins
etc.), a simple bond list is not of much use. What you want is *all*
forcefield terms. I can't think of any practical use for just the bonds.

Storage of connectivity information

The connectivity information is stored as tuples in the group /connectivity. The tuples are pairs, triples, etc. as needed.

A list of particles is an H5MD element:

/connectivity
    <name>: Integer[N]
      +-- particles_group: String[]

where N is the length of the list and particles_group is the name of a group within /particles. Several connectivity elements may be defined for any particles group but a connectivity element may only refer to a single particle group.

If the corresponding particles_group does not possess the id element, the values in list_name correspond to the indexing of the elements in particles_group. Else, the values in list_name must be put in correspondence with the equal values in the id element.