Question

我试图利用Ray的并行化模型来逐条记录地处理文件。代码工作得很漂亮，但是对象存储增长很快，最终崩溃了。我避免使用ray.get（function.remote（）），因为它会降低性能，因为该任务由数百万个子任务组成，而且还有等待任务完成的开销。有没有办法设置对象存储的全局限制？

#code which constantly backpressusre the obejct storage, freeing space, but causes performance to be worse than serial execution
for record in infile:
    ray.get(createNucleotideCount.remote(record, copy.copy(dinucDict), copy.copy(tetranucDict),dinucList,tetranucList, filename))

#code that maximizes throughput but makes the object storage grow constantly
for record in infile:
    createNucleotideCount.remote(record, copy.copy(dinucDict), copy.copy(tetranucDict),dinucList,tetranucList, filename)

#the called function returns either 0 or 1.

Answer 1

您可以执行ray.init(object_store_memory=10**9)来限制对象存储使用1GB。

https://ray.readthedocs.io/en/latest/memory-management.html中有关内存管理的文档中有更多信息。

有没有办法限制Ray对象存储的最大内存使用量

1 个答案: