Question

我想知道通过h5py访问数据是否有任何特殊限制？我期待一个带有来自相关数据集的内部数组的数组。

我可以将对象加载到内存中，访问数据文件中的大多数键。

>>> import h5py

>>> fileobj = h5py.File('path/to/file.ext')
>>> test = fileobj['SomeKey']
>>> test.dtype
dtype('float64')
>>> test[:]
array([  3.50000460e+02,   1.23662217e-03,   1.23662872e-03, ...,
     9.94521356e-03,   9.94531916e-03,   9.94542476e-03])
>>> test.shape
(49682960,)
>>> # this loads fine
... problem = fileobj['SomeOtherKey']
>>> problem.shape
(13570,)
>>> # accessing specific keys works fine too
... sub_dset = problem['subkey1']
>>> sub_dset.dtype
dtype('O')
# checking compression, nothing out of the ordinary...
>>> problem._filters
{'gzip': 1, 'shuffle': (144,)}
>>> problem.dtype['subkey2']
dtype('O')
>>> problem['subkey2']
Segmentation fault

当我尝试切片，复制数据集等时也会发生这种情况。

>>> problem['subkey2'][:]
Segmentation fault

>>> problem['subkey2']
>>> import numpy as np
>>> arr = np.zeros(problem.shape)
>>> # this works fine
... ds = fileobj.create_dataset('ds', data=problem['subkey1'])
>>> # this also works fine
... ds1 = f.create_dataset('ds1', problem.shape, problem.dtype, data=problem['subkey1'])
>>> # however.....
... ds2 = f.create_dataset('ds2', problem.shape, problem.dtype, data=problem)
Segmentation Fault
>>> ds = f.create_dataset('ds2', data=problem['subkey2'])
Segmentation Fault

起初我认为这可能是一个记忆问题。但是，我使用标准文件对其进行了测试，该文件用作测试（现在已弃用，我相信）库：

https://github.com/jmchilton/pymz5/blob/master/test_data/test.mz5

在这种情况下，问题将通过以下示例重现：

>>> fileobj = h5py.File('test.mz5')
>>> problem = fileobj['SpectrumMetaData']
>>> problem.shape
(26,)
>>> problem['precursors']
Segmentation Fault

我已经检查了，并且＆＃39; id＆＃39; SpectrumMetaData数据集（以及其他键）很好，这个例子没有压缩过滤器，这表明分段错误是由数据本身引起的。

如果这是Python版本或特定于h5py，我已经使用Python 2.7.9在h5py版本2.5.0上运行了所有以前的测试。

当我在h5py版本2.2.1（Ubuntu，Python 3.4.3）上尝试时，我得到：

TypeError: No NumPy equivalent for TypeVlenID exists

我知道version 2.3之后h5py中的变长dtype支持得到了改善，所以我在Python 3上升级到了h5py 2.5.0，并得到了和以前一样的问题。

使用更高级API的h5py访问此数据的任何方法？如果可能的话，我宁愿不想在Cython中构建自定义数据类型。

在尝试访问h5py.Dataset中的对象时，不变的Segfault

0 个答案: