Question

系统信息：CentOS，Python 3.5.2、64核，96 GB内存

因此，我正在尝试从hdf文件中将大型阵列（50GB）加载到ram（96GB）中。每个块都比工作程序内存限制少1.5GB。有时似乎无法完成崩溃或重新启动工作程序的过程，但我也看不到Web仪表板上的内存使用增加或正在执行任务。

这项工作还是我在这里遗漏了明显的东西？

import dask.array as da
import h5py

from dask.distributed import LocalCluster, Client
from matplotlib import pyplot as plt

lc = LocalCluster(n_workers=64)
c = Client(lc)

f = h5py.File('50GB.h5', 'r')
data = f['data']
# data.shape = 2000000, 1000
x = da.from_array(data, chunks=(2000000, 100))
x = c.persist(x)

Answer 1

这是大块与工人交互方式的误解。专门更改LocalCluster的初始化方式可以解决上述问题。

lc = LocalCluster(n_workers=1) # This way 1 works has 90GB of mem so can be persisted

使用LocalCluster的限制？将持久的50GB数据崩溃到90GB内存

1 个答案: