H5MD - proposal 100: Storage of time information
------------------------------------------------

**status:** Accepted in H5MD 1.1

### Objective

This proposal aims at enhancing the storage of time information. Some reasonable
use cases are not covered in H5MD 1.0, such as equal time steps.

### Motivations

A few excerpts from the h5md-user mailing list.

http://article.gmane.org/gmane.science.simulation.h5md.user/640

    Now that I have started to use H5MD seriously, I also start to notice problems
    with it. One of them is the obligatory presence of a "time" dataset. I want
    to store a Monte-Carlo trajectory which consists of a sequence of
    configurations, but without any associated time values.  If I want to
    respect the H5MD specification, I have to make up numbers, which is not a
    good habit to take.

    Is there any reason why "time" was made obligatory?

http://article.gmane.org/gmane.science.simulation.h5md.user/641

    ...and adding to that, can we also make the "step" optional? Weird as this
    may sound, we would also have to invent step numbers.

http://article.gmane.org/gmane.science.simulation.h5md.user/651

    > In a more general idea about step/time, I have an idea since a long
    > time. I didn't want it for H5MD 1.0 to avoid any confusion. But storing
    > step and time when step is simply step[i] = STEP_SIZE*i and time[i] =
    > STEP_SIZE*DT*i is a bit of a waste. We could define a proper setup for
    > regularly sampled data, for which step[0], STEP_SIZE, time[0] and DT
    > should be given.

    Good idea, and not just to avoid wasting space. It would also contain the
    message to the reader "this is regularly sampled data". For some analyses
    this makes a big difference. For example, computing time correlation
    functions of regularly sampled data is straightforward and efficient,
    whereas it is cumbersome, slow, and imprecise for irregular time series.

    Right now, the only way to check if a time series is regular is to check all
    the time labels. However, these are floats and thus subject to round-off
    error. I'll bet that in practice, analysis software will simply assume the
    time series to be equally spaced and not bother to check. I'll also bet that
    sooner or later this will lead to wrong results being published.

See also http://nongnu.org/h5md/discussion.html#extensions-storage-of-time-dependent-data

### Relax datatype of `time`

Whereas the `Integer` character of `step` plays a role in the identification of
time frames, `time` could be relaxed to "`Integer` or `Real`".

### Optional use of `time`

As, e.g., Monte-Carlo simulations may not possess a well-defined time, it is
proposed that only `step` is mandatory in a time-dependent H5MD element.

### Linearly spaced `step` and `time`

When the increments of `step` and/or `time` are constant, the interpretation
`step[i]=step0+i*delta_step` and `time[i]=time0+i*delta_time` holds.
This change would remove the need to store unneeded data but also facilitate the
analysis, as many algorithm work only with fixed-spacing of data.

The content of a time-dependent H5MD element needs an update to allow for the
absence of `step` and `time`.

#### Proposition to use scalar datasets.

The structure of a time-dependent H5MD element is

    <element>
     \-- step: Integer[]
     \-- (step0): Integer[]
     \-- (time): Float[]
     \-- (time0): Float[]
     \-- value: <type>[variable][...]

This structure matches closely the existing one. The use of scalar datasets
allows to (i) keep the status of `step` (etc.) a HDF5 dataset and not an
attribute (ii) to distinguish clearly from the current structure by using scalar
datasets.

While not a requirement, it would be encouraged to use compact datasets here.

#### Proposition to use attributes

The structure of a time-dependent H5MD element is

    <element>: <type>[variable][...]
     +-- step: Integer[]
     +-- (step0): Integer[]
     +-- (time): Float[]
     +-- (time0): Float[]

#### Proposition to mix scalar datasets and attributes

This structure matches closely the existing one. The use of scalar datasets
allows us to (i) keep the status of `step` and `time` as HDF5 datasets and (ii)
to distinguish clearly from the current structure by using scalar datasets,
i.e., the distinction is _after_ reading the shape of the dataset. (iii) Using
HDF5 attributes for the offset allows for a single generic identifier `offset`
and avoids cluttering of the HDF5 group forming the H5MD element.

    <element>:
     \-- step: Integer[]
         +-- (offset): Integer[]
     \-- (time): Float[]
         +-- (offset): Float[]
     \-- value: <type>[variable][...]