HDF5 Example
Retrieve a data file
$ wget http://mldata.org/repository/data/download/1 -O data.h5
List contents from the command-line
hdf5-tools is a neat collection of command-line tools to work with HDF5 files.
$ h5ls --full -r data.h5 / Group /data Group /data/data Dataset {11, 612} /data_descr Group /data_descr/names Dataset {11} /data_descr/ordering Dataset {1}
$ h5dump -d /data/data data.h5 HDF5 "data.h5" { DATASET "/data/data" { DATATYPE H5T_IEEE_F64LE DATASPACE SIMPLE { ( 11, 612 ) / ( 11, 612 ) } DATA { (0,0): -1, -1, 1, -1, -1, 1, -1, -1, 1, 1, -1, 1, -1, -1, 1, -1, -1, 1, (0,18): -1, -1, 1, -1, 1, -1, -1, 1, -1, 1, 1, -1, -1, 1, -1, -1, 1, -1, (0,36): -1, 1, -1, 1, -1, -1, 1, -1, 1, -1, 1, -1, 1, 1, -1, -1, 1, 1, -1, (0,55): -1, 1, -1, 1, -1, 1, -1, 1, 1, -1, -1, 1, -1, 1, -1, 1, -1, 1, -1, (0,74): 1, 1, -1, -1, -1, 1, -1, -1, 1, -1, -1, -1, -1, 1, 1, -1, -1, 1, (0,92): 1, 1, 1, -1, -1, 1, 1, -1, 2, 2, -2, 2, -2, -2, -2, 2, 2, -2, -2, (0,111): -2, -2, -2, 2, 2, 2, 2, -2, -2, 2, -2, -2, -2, -2, -2, -2, 2, -2, ...
Access in Matlab
info=hdf5info('data.h5'); x=hdf5read('data.h5','/data/data') ans = Columns 1 through 7 -1.0000 0.2107 0.0044 0.0013 0.0001 0.0000 0.0000 -1.0000 0.2152 0.0042 0.0014 0.0002 0.0000 0.0000 1.0000 0.1972 0.0023 0.0015 0.0000 0.0000 -0.0000 ...
Access in python
h5py will come to your aid:
$ python >>> import h5py >>> f = h5py.File('data.h5','r') >>> f.values() [<HDF5 group "/data" (1 members)>, <HDF5 group "/data_descr" (2 members)>] >>> f["/data/data"] <HDF5 dataset "data": shape (11, 612), type "<f8"> >>> f["/data/data"][:,:] array([[ -1.00000000e+00, -1.00000000e+00, 1.00000000e+00, ..., 3.00000000e+00, -3.00000000e+00, 3.00000000e+00], [ 2.10663000e-01, 2.15192000e-01, 1.97153000e-01, ..., 3.15029000e-01, 2.96945000e-01, 4.08534000e-01], [ 4.43414000e-03, 4.18483000e-03, 2.30872000e-03, ..., 3.37745000e-02, 5.68704000e-02, 6.02136000e-02], ..., [ 2.23000000e+00, 2.20000000e+00, 2.35000000e+00, ..., 9.40000000e-01, 6.00000000e-01, 1.00000000e+00], [ 1.27000000e+00, 1.28000000e+00, 1.28000000e+00, ..., 1.24000000e+00, 1.31000000e+00, 1.30000000e+00], [ 1.28000000e+00, 1.28000000e+00, 1.28000000e+00, ..., 1.33000000e+00, 1.33000000e+00, 1.32000000e+00]])
Access in shogun
$ python Python 2.5.5 (r255:77872, Apr 21 2010, 08:40:04) [GCC 4.4.3] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> from shogun.Features import * >>> from shogun.Library import * >>> f=HDF5File('data.h5','r', '/data/data') >>> feats=RealFeatures() >>> feats.load(f) >>> feats.get_feature_matrix() array([[-1. , 1. , 1. , ..., 1.28, 1.28, 1.32], [-1. , -1. , -1. , ..., 1.27, 1.79, 1.29], [ 1. , -1. , -1. , ..., 1.28, 1.27, 1.4 ], ..., [ 1. , -1. , -1. , ..., 1.32, 1.28, 1.33], [ 1. , 1. , 1. , ..., 1.27, 1.33, 1.33], [-1. , -1. , -1. , ..., 1.26, 1.32, 1.32]])
Using ASCII text files
h5utils is a set of utilities for visualization and conversion of scientific data in HDF5.
- This would convert the data set into a ascii file with comma separated values:
$ h5totxt -s',' -d /data/data data.h5
- Or spaces as seperators and transposed:
$ h5totxt -T -s' ' -d /data/data data.h5
- Convert your plain ascii file to HDF5 in dataset /data/data:
$ h5fromtxt -d '/data/data' data.h5 <<EOF 1 2 3 4 5 6 7 8 EOF
Contents
Acknowledgements
This project is supported by PASCAL (Pattern Analysis, Statistical Modelling and Computational Learning)
http://www.pascal-network.org/.