Question

我有一个大数组（几百万个元素），我需要根据几个不同的标准切出少量（几百个）。我目前正在使用np.where，大致如下：

for threshold in np.arange(0,1,.1):
    x=np.random.random(5000000)
    y=np.random.random(5000000)
    z=np.random.random(5000000)
    inds=np.where((x < threshold) & (y > threshold) & (z > threshold) & (z < threshold+0.1))

DoSomeJunk(a[inds], b[inds], c[inds])

，然后使用ipts从各种数组中提取正确的点。但是，我在那np.where行上收到MemoryError。我在其他一些相关的帖子上看到np.where可能是一个内存猪，正在复制数据。

在其中具有倍数＆表示数据被多次复制了吗？有没有一种更有效的方式来切片数据，而这种方式不需要占用大量内存，又可以保留我想要的索引列表，以便以后可以在多个地方使用同一切片？

请注意，我发布的这个示例实际上并不会产生错误，但是结构类似于我所拥有的。

Answer 1

在每种情况下，您都将创建一个临时布尔数组，其大小与x，y和z相同。要优化此效果，您可以迭代创建遮罩：

for threshold in np.arange(0,1,.1):
    x=np.random.random(5000000)
    y=np.random.random(5000000)
    z=np.random.random(5000000)
    inds = x < threshold
    inds &= y > threshold
    inds &= z > threshold
    inds &= z < threshold+0.1

DoSomeJunk(a[inds], b[inds], c[inds])

对于此示例，这会将内存使用量从160 MB减少到40 MB。

比numpy.where更有效的内存选项？

1 个答案: