我在pandas中读取hdf文件时遇到问题。截至目前,我不知道该文件的密钥。
在这种情况下如何读取文件[data.hdf]?并且,我的文件是.hdf而不是.h5,它是否会对数据提取产生影响?
我发现你需要一个'商店中的组标识符'
pandas.io.pytables.read_hdf(path_or_buf, key, **kwargs)
我能够从pytables中获取元数据
File(filename=data.hdf, title='', mode='a', root_uep='/', filters=Filters(complevel=0, shuffle=False, fletcher32=False, least_significant_digit=None))
/ (RootGroup) ''
/UID (EArray(317,)) ''
atom := StringAtom(itemsize=36, shape=(), dflt='')
maindim := 0
flavor := 'numpy'
byteorder := 'irrelevant'
chunkshape := (100,)
/X Y (EArray(8319, 2, 317)) ''
atom := Float32Atom(shape=(), dflt=0.0)
maindim := 0
flavor := 'numpy'
byteorder := 'little'
chunkshape := (1000, 2, 100)
如何通过pandas使其可读?
答案 0 :(得分:1)
首先(.hdf或.h5)没有任何区别。 其次,我对大熊猫不太确定,但是我读了HDF5键:
import h5py
h5f = h5py.File("test.h5", "r")
h5f.keys()
或
h5f.values()
答案 1 :(得分:0)
文档为here。但是,您将能够直接读取您使用pandas显示的格式。您需要使用PyTables来读取它。即使没有pandas使用的元数据,pandas也可以直接读取PyTables表格式。
答案 2 :(得分:0)
pyhdf
将成为python
您可以阅读并查看以下按键:
import pyhdf
hdf = pyhdf.SD.SD('file.hdf')
hdf.datasets()
我希望它会对你有所帮助! gud luck
答案 3 :(得分:0)
您可以使用此简单功能查看任何HDF文件的变量名称(仅适用于科学模式下的变量)
from pyhdf.SD import *
def HDFvars(File):
"""
Extract variable names for an hdf file
"""
# hdfFile = SD.SD(File, mode=1)
hdfFile = SD(File, mode=1)
dsets = hdfFile.datasets()
k = []
for key in dsets.keys():
k.append(key)
k.sort()
hdfFile.end() # close the file
return k
如果变量不在科学模式下,则可以使用以下程序尝试pyhdf.V,该程序显示其中包含的vgroup的内容 任何HDF文件。
from pyhdf.HDF import *
from pyhdf.V import *
from pyhdf.VS import *
from pyhdf.SD import *
def describevg(refnum):
# Describe the vgroup with the given refnum.
# Open vgroup in read mode.
vg = v.attach(refnum)
print "----------------"
print "name:", vg._name, "class:",vg._class, "tag,ref:",
print vg._tag, vg._refnum
# Show the number of members of each main object type.
print "members: ", vg._nmembers,
print "datasets:", vg.nrefs(HC.DFTAG_NDG),
print "vdatas: ", vg.nrefs(HC.DFTAG_VH),
print "vgroups: ", vg.nrefs(HC.DFTAG_VG)
# Read the contents of the vgroup.
members = vg.tagrefs()
# Display info about each member.
index = -1
for tag, ref in members:
index += 1
print "member index", index
# Vdata tag
if tag == HC.DFTAG_VH:
vd = vs.attach(ref)
nrecs, intmode, fields, size, name = vd.inquire()
print " vdata:",name, "tag,ref:",tag, ref
print " fields:",fields
print " nrecs:",nrecs
vd.detach()
# SDS tag
elif tag == HC.DFTAG_NDG:
sds = sd.select(sd.reftoindex(ref))
name, rank, dims, type, nattrs = sds.info()
print " dataset:",name, "tag,ref:", tag, ref
print " dims:",dims
print " type:",type
sds.endaccess()
# VS tag
elif tag == HC.DFTAG_VG:
vg0 = v.attach(ref)
print " vgroup:", vg0._name, "tag,ref:", tag, ref
vg0.detach()
# Unhandled tag
else:
print "unhandled tag,ref",tag,ref
# Close vgroup
vg.detach()
# Open HDF file in readonly mode.
filename = 'yourfile.hdf'
hdf = HDF(filename)
# Initialize the SD, V and VS interfaces on the file.
sd = SD(filename)
vs = hdf.vstart()
v = hdf.vgstart()
# Scan all vgroups in the file.
ref = -1
while 1:
try:
ref = v.getid(ref)
print ref
except HDF4Error,msg: # no more vgroup
break
describevg(ref)