Question

我必须将许多.npy文件加载到内存中。每个文件都包含一个numpy数组，大小约为3-6 MiB。

我已经能够使用numpy.load并行加载这些文件，但我认为它仍然花费太长时间。

代码示例：

import threading 
import numpy as np

def load_chunk(ch, idx, results):           
    # ch is a chunk of filenames to load
    ars = []
    for fn in ch:
        ars.append(np.load(fn))
    results[idx] = ars

threads = [None] * len(fn_ch)
all_results = [None] * len(fn_ch)

for i in range(len(fn_ch)):
    t = threading.Thread(target=load_chunk, args=(fn_ch[i], i, all_results))
    t.start()
    threads[i] = t

[t.join() for t in threads]

没有基础设施瓶颈。执行代码时，CPU和可用的磁盘IOPS根本没有使用。

工作负载分散在40个线程中-将该数目增加到80个线程或将ot减少到20个线程不会影响加载时间。

有什么方法可以加快这一过程吗？

如何快速加载许多.npy文件

0 个答案: