status: Accepted in H5MD 1.1
This proposal aims at enhancing the storage of time information. Some reasonable use cases are not covered in H5MD 1.0, such as equal time steps.
A few excerpts from the h5md-user mailing list.
http://article.gmane.org/gmane.science.simulation.h5md.user/640
Now that I have started to use H5MD seriously, I also start to notice problems
with it. One of them is the obligatory presence of a "time" dataset. I want
to store a Monte-Carlo trajectory which consists of a sequence of
configurations, but without any associated time values. If I want to
respect the H5MD specification, I have to make up numbers, which is not a
good habit to take.
Is there any reason why "time" was made obligatory?
http://article.gmane.org/gmane.science.simulation.h5md.user/641
...and adding to that, can we also make the "step" optional? Weird as this
may sound, we would also have to invent step numbers.
http://article.gmane.org/gmane.science.simulation.h5md.user/651
> In a more general idea about step/time, I have an idea since a long
> time. I didn't want it for H5MD 1.0 to avoid any confusion. But storing
> step and time when step is simply step[i] = STEP_SIZE*i and time[i] =
> STEP_SIZE*DT*i is a bit of a waste. We could define a proper setup for
> regularly sampled data, for which step[0], STEP_SIZE, time[0] and DT
> should be given.
Good idea, and not just to avoid wasting space. It would also contain the
message to the reader "this is regularly sampled data". For some analyses
this makes a big difference. For example, computing time correlation
functions of regularly sampled data is straightforward and efficient,
whereas it is cumbersome, slow, and imprecise for irregular time series.
Right now, the only way to check if a time series is regular is to check all
the time labels. However, these are floats and thus subject to round-off
error. I'll bet that in practice, analysis software will simply assume the
time series to be equally spaced and not bother to check. I'll also bet that
sooner or later this will lead to wrong results being published.
See also http://nongnu.org/h5md/discussion.html#extensions-storage-of-time-dependent-data
time
¶Whereas the Integer
character of step
plays a role in the
identification of time frames, time
could be relaxed to “Integer
or Real
”.
time
¶As, e.g., Monte-Carlo simulations may not possess a well-defined time,
it is proposed that only step
is mandatory in a time-dependent H5MD
element.
step
and time
¶When the increments of step
and/or time
are constant, the
interpretation step[i]=step0+i*delta_step
and
time[i]=time0+i*delta_time
holds. This change would remove the need
to store unneeded data but also facilitate the analysis, as many
algorithm work only with fixed-spacing of data.
The content of a time-dependent H5MD element needs an update to allow
for the absence of step
and time
.
The structure of a time-dependent H5MD element is
<element>
\-- step: Integer[]
\-- (step0): Integer[]
\-- (time): Float[]
\-- (time0): Float[]
\-- value: <type>[variable][...]
This structure matches closely the existing one. The use of scalar
datasets allows to (i) keep the status of step
(etc.) a HDF5 dataset
and not an attribute (ii) to distinguish clearly from the current
structure by using scalar datasets.
While not a requirement, it would be encouraged to use compact datasets here.
The structure of a time-dependent H5MD element is
<element>: <type>[variable][...]
+-- step: Integer[]
+-- (step0): Integer[]
+-- (time): Float[]
+-- (time0): Float[]
This structure matches closely the existing one. The use of scalar
datasets allows us to (i) keep the status of step
and time
as
HDF5 datasets and (ii) to distinguish clearly from the current structure
by using scalar datasets, i.e., the distinction is after reading the
shape of the dataset. (iii) Using HDF5 attributes for the offset allows
for a single generic identifier offset
and avoids cluttering of the
HDF5 group forming the H5MD element.
<element>:
\-- step: Integer[]
+-- (offset): Integer[]
\-- (time): Float[]
+-- (offset): Float[]
\-- value: <type>[variable][...]