HDF5 Example

Retrieve a data file

$ wget http://mldata.org/repository/data/download/1 -O data.h5

List contents from the command-line

hdf5-tools is a neat collection of command-line tools to work with HDF5 files.

  • $ h5ls --full -r data.h5
    /                        Group
    /data                    Group
    /data/data               Dataset {11, 612}
    /data_descr              Group
    /data_descr/names        Dataset {11}
    /data_descr/ordering     Dataset {1}
    		
  • $ h5dump -d /data/data data.h5
    HDF5 "data.h5" {
    DATASET "/data/data" {
       DATATYPE  H5T_IEEE_F64LE
       DATASPACE  SIMPLE { ( 11, 612 ) / ( 11, 612 ) }
       DATA {
       (0,0): -1, -1, 1, -1, -1, 1, -1, -1, 1, 1, -1, 1, -1, -1, 1, -1, -1, 1,
       (0,18): -1, -1, 1, -1, 1, -1, -1, 1, -1, 1, 1, -1, -1, 1, -1, -1, 1, -1,
       (0,36): -1, 1, -1, 1, -1, -1, 1, -1, 1, -1, 1, -1, 1, 1, -1, -1, 1, 1, -1,
       (0,55): -1, 1, -1, 1, -1, 1, -1, 1, 1, -1, -1, 1, -1, 1, -1, 1, -1, 1, -1,
       (0,74): 1, 1, -1, -1, -1, 1, -1, -1, 1, -1, -1, -1, -1, 1, 1, -1, -1, 1,
       (0,92): 1, 1, 1, -1, -1, 1, 1, -1, 2, 2, -2, 2, -2, -2, -2, 2, 2, -2, -2,
       (0,111): -2, -2, -2, 2, 2, 2, 2, -2, -2, 2, -2, -2, -2, -2, -2, -2, 2, -2,
    ...
    		

Access in Matlab

info=hdf5info('data.h5');

x=hdf5read('data.h5','/data/data')

ans =

  Columns 1 through 7

   -1.0000    0.2107    0.0044    0.0013    0.0001    0.0000    0.0000
   -1.0000    0.2152    0.0042    0.0014    0.0002    0.0000    0.0000
    1.0000    0.1972    0.0023    0.0015    0.0000    0.0000   -0.0000
...
	

Access in python

h5py will come to your aid:

$ python
>>> import h5py
>>> f = h5py.File('data.h5','r')
>>> f.values()
[<HDF5 group "/data" (1 members)>, <HDF5 group "/data_descr" (2 members)>]
>>> f["/data/data"]
<HDF5 dataset "data": shape (11, 612), type "<f8">
>>> f["/data/data"][:,:]
array([[ -1.00000000e+00,  -1.00000000e+00,   1.00000000e+00, ...,
          3.00000000e+00,  -3.00000000e+00,   3.00000000e+00],
       [  2.10663000e-01,   2.15192000e-01,   1.97153000e-01, ...,
          3.15029000e-01,   2.96945000e-01,   4.08534000e-01],
       [  4.43414000e-03,   4.18483000e-03,   2.30872000e-03, ...,
          3.37745000e-02,   5.68704000e-02,   6.02136000e-02],
       ...,
       [  2.23000000e+00,   2.20000000e+00,   2.35000000e+00, ...,
          9.40000000e-01,   6.00000000e-01,   1.00000000e+00],
       [  1.27000000e+00,   1.28000000e+00,   1.28000000e+00, ...,
          1.24000000e+00,   1.31000000e+00,   1.30000000e+00],
       [  1.28000000e+00,   1.28000000e+00,   1.28000000e+00, ...,
          1.33000000e+00,   1.33000000e+00,   1.32000000e+00]])
		

Access in shogun

$ python
Python 2.5.5 (r255:77872, Apr 21 2010, 08:40:04)
[GCC 4.4.3] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> from shogun.Features import *
>>> from shogun.Library import *
>>> f=HDF5File('data.h5','r', '/data/data')
>>> feats=RealFeatures()
>>> feats.load(f)
>>> feats.get_feature_matrix()
array([[-1.  ,  1.  ,  1.  , ...,  1.28,  1.28,  1.32],
       [-1.  , -1.  , -1.  , ...,  1.27,  1.79,  1.29],
       [ 1.  , -1.  , -1.  , ...,  1.28,  1.27,  1.4 ],
       ...,
       [ 1.  , -1.  , -1.  , ...,  1.32,  1.28,  1.33],
       [ 1.  ,  1.  ,  1.  , ...,  1.27,  1.33,  1.33],
       [-1.  , -1.  , -1.  , ...,  1.26,  1.32,  1.32]])
	

Using ASCII text files

h5utils is a set of utilities for visualization and conversion of scientific data in HDF5.

  • This would convert the data set into a ascii file with comma separated values:
    $ h5totxt -s',' -d /data/data data.h5
  • Or spaces as seperators and transposed:
    $ h5totxt -T -s' ' -d /data/data data.h5
  • Convert your plain ascii file to HDF5 in dataset /data/data:
    $ h5fromtxt -d '/data/data' data.h5 <<EOF
    1 2 3 4
    5 6 7 8
    EOF
    			


Contents

Acknowledgements

This project is supported by PASCAL (Pattern Analysis, Statistical Modelling and Computational Learning)
PASCAL Logo
http://www.pascal-network.org/.