Question

我正在尝试读取hdf文件，但没有显示任何组。我已经尝试了几种使用表和h5py的方法，但是它们都不能在文件中显示组。我检查了文件是＆＃39;分层数据格式（版本5）数据＆＃39; （见更新）。文件信息为here以供参考。

可以找到示例数据here

import h5py
import tables as tb

hdffile = "TRMM_LIS_SC.04.1_2010.260.73132"

使用h5py：

f = h5py.File(hdffile,'w')
print(f)

输出：

< HDF5 file "TRMM_LIS_SC.04.1_2010.260.73132" (mode r+) >
[]

使用表格

fi=tb.openFile(hdffile,'r')
print(fi)

输出：

TRMM_LIS_SC.04.1_2010.260.73132 (File) ''
Last modif.: 'Wed Aug 10 18:41:44 2016'
Object Tree:
/ (RootGroup) ''

Closing remaining open files:TRMM_LIS_SC.04.1_2010.260.73132...done

更新

h5py.File(hdffile,'w') overwrote the file and emptied it.

现在我的问题是如何将hdf版本4文件读入python，因为h5py和table都不起作用？

Answer 1

文件有多大？我认为做h5py.File(hdffile,'w')会覆盖它，所以它是空的。使用h5py.File(hdffile,'r')阅读。

我没有足够的业力来回复@Luke H的回答，但把它读成大熊猫可能不是一个好主意。熊猫hdf5使用pytables，这是一个自以为是的＆＃34;使用hdf5的方式。这意味着它存储额外的元数据（例如索引）。所以我只会使用pytables来读取文件，如果它是用pytables制作的。

Answer 2

<强>更新

我建议你先将convert你的HDF第4版文件发送到HDF5 / h5文件，因为所有现代库/模块都在使用HDF版本5 ......

OLD回答：

以这种方式尝试：

store = pd.HDFStore(filename)
print(store)

这应该打印有关HDF文件的详细信息，包括存储的密钥，存储的DF的长度等。

演示：

In [18]: fn = r'C:\Temp\a.h5'

In [19]: store = pd.HDFStore(fn)

In [20]: print(store)
<class 'pandas.io.pytables.HDFStore'>
File path: C:\Temp\a.h5
/df_dc               frame_table  (typ->appendable,nrows->10,ncols->3,indexers->[index],dc->[a,b,c])
/df_no_dc            frame_table  (typ->appendable,nrows->10,ncols->3,indexers->[index])

现在您可以使用上面输出中的键来读取数据帧：

In [21]: df = store.select('df_dc')

In [22]: df
Out[22]:
    a   b   c
0  92  80  86
1  27  49  62
2  55  64  60
3  31  66   3
4  37  75  81
5  49  69  87
6  59   0  87
7  69  91  39
8  93  75  31
9  21  15   7

Answer 3

尝试使用pandas：

import pandas as pd
f = pd.read_hdf(C:/path/to/file)

See Pandas HDF documentation here.

这应该在任何hdf文件中读取您可以操作的数据帧。

使用Python的HDF文件中的数据丢失

3 个答案: