我有12个h5文件太大,无法容纳在内存中。这些文件的结构如下:
file1.h5
├image [float64: 3341 × 126 × 256 × 256]
├pulse [uint64: 126]
└train [uint64: 3341]
单个文件的大小约为200 GB,因此即使在HPC中,也很难在内存中加载三个以上的文件。
但是即使只有一个文件,我也无法使用dask加载它。我收到以下错误,很可能是因为API使用不当或数据结构错误。后者,我可以通过一些指导对其进行修复:
import dask.dataframe as dd
dummy = dd.read_hdf('file1.h5', '/image')
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-6-84ef3fa4152e> in <module>
1 import dask.dataframe as dd
----> 2 dummy = dd.read_hdf('file1.h5', '/image')
~/.local/lib/python3.7/site-packages/dask/dataframe/io/hdf.py in read_hdf(pattern, key, start, stop, columns, chunksize, sorted_index, lock, mode)
498 mode=mode,
499 )
--> 500 for path in paths
501 ]
502 )
~/.local/lib/python3.7/site-packages/dask/dataframe/io/hdf.py in <listcomp>(.0)
498 mode=mode,
499 )
--> 500 for path in paths
501 ]
502 )
~/.local/lib/python3.7/site-packages/dask/dataframe/io/hdf.py in _read_single_hdf(path, key, start, stop, columns, chunksize, sorted_index, lock, mode)
380 [
381 one_path_one_key(path, k, start, s, columns, chunksize, d, lock)
--> 382 for k, s, d in zip(keys, stops, divisions)
383 ]
384 )
~/.local/lib/python3.7/site-packages/dask/dataframe/multi.py in concat(dfs, axis, join, interleave_partitions)
1034 raise TypeError("dfs must be a list of DataFrames/Series objects")
1035 if len(dfs) == 0:
-> 1036 raise ValueError("No objects to concatenate")
1037 if len(dfs) == 1:
1038 if axis == 1 and isinstance(dfs[0], Series):
ValueError: No objects to concatenate
任何帮助将不胜感激:)