将dask与h5文件一起使用时,没有要连接的对象

时间:2020-11-08 18:02:20

标签: python memory-management dask h5py

我有12个h5文件太大,无法容纳在内存中。这些文件的结构如下:

file1.h5
├image  [float64: 3341 × 126 × 256 × 256]
├pulse  [uint64: 126]
└train  [uint64: 3341]

单个文件的大小约为200 GB,因此即使在HPC中,也很难在内存中加载三个以上的文件。

但是即使只有一个文件,我也无法使用dask加载它。我收到以下错误,很可能是因为API使用不当或数据结构错误。后者,我可以通过一些指导对其进行修复:

import dask.dataframe as dd
dummy = dd.read_hdf('file1.h5', '/image')


---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-6-84ef3fa4152e> in <module>
      1 import dask.dataframe as dd
----> 2 dummy = dd.read_hdf('file1.h5', '/image')

~/.local/lib/python3.7/site-packages/dask/dataframe/io/hdf.py in read_hdf(pattern, key, start, stop, columns, chunksize, sorted_index, lock, mode)
    498                 mode=mode,
    499             )
--> 500             for path in paths
    501         ]
    502     )

~/.local/lib/python3.7/site-packages/dask/dataframe/io/hdf.py in <listcomp>(.0)
    498                 mode=mode,
    499             )
--> 500             for path in paths
    501         ]
    502     )

~/.local/lib/python3.7/site-packages/dask/dataframe/io/hdf.py in _read_single_hdf(path, key, start, stop, columns, chunksize, sorted_index, lock, mode)
    380         [
    381             one_path_one_key(path, k, start, s, columns, chunksize, d, lock)
--> 382             for k, s, d in zip(keys, stops, divisions)
    383         ]
    384     )

~/.local/lib/python3.7/site-packages/dask/dataframe/multi.py in concat(dfs, axis, join, interleave_partitions)
   1034         raise TypeError("dfs must be a list of DataFrames/Series objects")
   1035     if len(dfs) == 0:
-> 1036         raise ValueError("No objects to concatenate")
   1037     if len(dfs) == 1:
   1038         if axis == 1 and isinstance(dfs[0], Series):

ValueError: No objects to concatenate

任何帮助将不胜感激:)

0 个答案:

没有答案
相关问题