我试图利用Ray的并行化模型来逐条记录地处理文件。代码工作得很漂亮,但是对象存储增长很快,最终崩溃了。我避免使用ray.get(function.remote()),因为它会降低性能,因为该任务由数百万个子任务组成,而且还有等待任务完成的开销。有没有办法设置对象存储的全局限制?
#code which constantly backpressusre the obejct storage, freeing space, but causes performance to be worse than serial execution
for record in infile:
ray.get(createNucleotideCount.remote(record, copy.copy(dinucDict), copy.copy(tetranucDict),dinucList,tetranucList, filename))
#code that maximizes throughput but makes the object storage grow constantly
for record in infile:
createNucleotideCount.remote(record, copy.copy(dinucDict), copy.copy(tetranucDict),dinucList,tetranucList, filename)
#the called function returns either 0 or 1.
答案 0 :(得分:0)
您可以执行ray.init(object_store_memory=10**9)
来限制对象存储使用1GB。
https://ray.readthedocs.io/en/latest/memory-management.html中有关内存管理的文档中有更多信息。