如何将真正大型的数据集上传到hdf5文件?

时间:2016-06-11 21:21:46

标签: python hdf5 h5py

Traceback (most recent call last):
  File "populate_h5.py", line 116, in <module>
    dset_X[n_images:n_images+1,:,:,:]=hc
  File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper (-------src-dir--------/h5py/_objects.c:2582)
  File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper (-------src-dir--------/h5py/_objects.c:2541)
  File "/Users/alex/anaconda2/lib/python2.7/site-packages/h5py/_hl/dataset.py", line 618, in __setitem__
    self.id.write(mspace, fspace, val, mtype, dxpl=self._dxpl)
  File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper (-------src-dir--------/h5py/_objects.c:2582)
  File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper (-------src-dir--------/h5py/_objects.c:2541)
  File "h5py/h5d.pyx", line 221, in h5py.h5d.DatasetID.write (-------src-dir--------/h5py/h5d.c:3421)
  File "h5py/_proxy.pyx", line 132, in h5py._proxy.dset_rw (-------src-dir--------/h5py/_proxy.c:1794)
  File "h5py/_proxy.pyx", line 93, in h5py._proxy.H5PY_H5Dwrite (-------src-dir--------/h5py/_proxy.c:1501)
IOError: Can't prepare for writing data (File write failed: time = sat jun 11 13:31:12 2016
, filename = 'raw_tensors.h5', file descriptor = 3, errno = 28, error message = 'no space left on device', buf = 0x20b9b5658, total write size = 42288, bytes this sub-write = 42288, bytes actually written = 18446744073709551615, offset = 184442875904)
Exception IOError: IOError("Driver write request failed (File write failed: time = sat jun 11 13:31:12 2016\n, filename = 'raw_tensors.h5', file descriptor = 3, errno = 28, error message = 'no space left on device', buf = 0x20c404a08, total write size = 50176, bytes this sub-write = 50176, bytes actually written = 18446744073709551615, offset = 184442918192)",) in 'h5py._objects.ObjectID.__dealloc__' ignored
Exception IOError: IOError("Driver write request failed (File write failed: time = sat jun 11 13:31:12 2016\n, filename = 'raw_tensors.h5', file descriptor = 3, errno = 28, error message = 'no space left on device', buf = 0x20b9bfc08, total write size = 94080, bytes this sub-write = 94080, bytes actually written = 18446744073709551615, offset = 184443921712)",) in 'h5py._objects.ObjectID.__dealloc__' ignored
Segmentation fault: 11

这是发电机:

files = glob.glob('../manga-resized/sliced_images/*.png')
f = h5py.File('raw_tensors.h5','w')
dset_X = f.create_dataset('X',(1,960,224,224),maxshape=(None,960,224,224),chunks=True)
dset_y = f.create_dataset('y',(1,112,224*224),maxshape=(None,112,224*224),chunks=True)

n_images = 0
for fl in files[:1000]:
    img = color.rgb2lab(io.imread(fl)[..., :3])
    X = img[:,:,:1]
    y = img[:,:,1:]
    print "y shape: ",y.shape
    print "X shape: ",X.shape
    X_rows,X_columns,X_channels=X.shape
    y_rows,y_columns,y_channels=y.shape
    X_chunk = np.transpose(X,(2,0,1))
    X_chunk_3d = np.tile(X_chunk,(3,1,1))
    print "X_chunk_3d: ",X_chunk_3d.shape
    X_chunk_4d = np.expand_dims(X_chunk_3d,axis=0)
    print "X_chunk_4d: ",X_chunk_4d.shape
    hc = extract_hypercolumn(model,[3,8,15,22],X_chunk_4d)
    y_chunk = np.reshape(y,(y_rows*y_columns,y_channels))
    classed = KNN.predict_proba(y_chunk)
    classed = np.transpose(classed,(1,0))
    dset_X.resize(n_images+1,axis=0)
    dset_y.resize(n_images+1,axis=0)
    print "X_chunk: ",X_chunk.shape,"dset_X: ",dset_X.shape
    print "hypercolumn shape: ",hc.shape
    print "y_chunk: ",classed.shape,"dset_y: ",dset_y.shape
    dset_X[n_images:n_images+1,:,:,:]=hc
    dset_y[n_images:n_images+1,:,:]=classed
    n_images+= 1
    print dset_X.shape
    print dset_y.shape

f.close()

文件中有1836个文件,但正如您所看到的,我们只使用了1000个文件。错误发生在第857次迭代中。

错误消息:设备上没有剩余空间有点清楚,但我不知道如何解决这个问题?例如,我是否需要为每500个文件创建一个单独的hdf5文件?

0 个答案:

没有答案