尝试创建一个相当大的numpy ndarray数据集时遇到一个非常奇怪的问题。
e.g。
import h5py
import numpy as np
test_h5=h5py.File('test.hdf5','w')
n=3055693983 # fail
n=10000000000 # works
n=40000000000 # fail
n=100000000000 # works
n=20000000000 #fail
n=512 # works
test_h5.create_dataset('matrix', shape=(n,n), dtype=np.int8, compression='gzip', chunks=(256,256))
print(test_h5['matrix'].shape)
a=test_h5['matrix']
a[0:256,0:256]=np.ones((256,256))
块大小为(256,256)。
如果上面的ndarray设置为(512,512),那么一切都有效。
如果上面的ndarray设置为(100000000000,100000000000),一切都有效AOK ......
理想情况下,我想要一个大小的ndarray(3055693983,3055693983),它失败了以下内容:
(3055693983, 3055693983) Traceback (most recent call last): File "h5.py", line 16, in a[0:256,0:256]=np.ones((256,256)) File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper (/home/ilan/minonda/conda-bld/work/h5py/_objects.c:2696) File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper (/home/ilan/minonda/conda-bld/work/h5py/_objects.c:2654) File "/home/user/anaconda2/lib/python2.7/site-packages/h5py/_hl/dataset.py", line 618, in __setitem__ self.id.write(mspace, fspace, val, mtype, dxpl=self._dxpl) File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper (/home/ilan/minonda/conda-bld/work/h5py/_objects.c:2696) File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper (/home/ilan/minonda/conda-bld/work/h5py/_objects.c:2654) File "h5py/h5d.pyx", line 221, in h5py.h5d.DatasetID.write (/home/ilan/minonda/conda-bld/work/h5py/h5d.c:3527) File "h5py/_proxy.pyx", line 132, in h5py._proxy.dset_rw (/home/ilan/minonda/conda-bld/work/h5py/_proxy.c:1889) File "h5py/_proxy.pyx", line 93, in h5py._proxy.H5PY_H5Dwrite (/home/ilan/minonda/conda-bld/work/h5py/_proxy.c:1599) IOError: Can't prepare for writing data (Can't retrieve number of elements in file dataset)
将ndarray设置为几个随机大小会产生混合结果。有些工作,有些不工作...我认为这可能是一些简单的事情,比如ndarray大小不能被chunk_size整除,但这似乎不是问题。
我在这里缺少什么?