Question

我有几个一维numpy数组（大约5百万个元素）

我必须用相同的切片重复切片。我有一个数组集合（所有相同的维度），我想用相同的数组索引（数组的相同维度）切片它们

有没有办法为所有不同的阵列A校准A [索引]，这比天真的方式更有效？

也许有办法使用Cython来加快速度？

谢谢！

修改

为了使事情更清楚，这是我的设置：我有一个包含数百万个元素的数组A.要对此阵列A执行某个操作，我首先需要对其进行排序;但后来我想恢复原来的订单，所以我取消了它。我需要多次重复这个。简而言之：

String

我想优化循环内的代码。正如您所看到的，inv_indices始终是相同的，我认为可能有更有效的方法。

谢谢！

Answer 1

由于inv_indices重新排序数组，而不是选择子集，因此将fancy_A收集到一个位数组中并将其编入索引可能同样快速且节省空间。

results = []
for _ in range(100):
    fancy_A = fancy_function(sortedA) #returns an array with the same dimensions
    #res = fancy_A[inv_indices]
    results.append(fancy_A)

bigA = np.stack(results)
bigA = bigA[:, inv_indices]    # assumes inv_indices is a list or array

如果fancy_A是1d且inv_indices是一个简单的列表，那么将其应用到堆栈是很简单的：

In [849]: A = np.random.randint(0,10,10)
In [850]: A
Out[850]: array([0, 1, 5, 7, 4, 4, 0, 6, 9, 1])
In [851]: idx = np.argsort(A)
In [852]: idx
Out[852]: array([0, 6, 1, 9, 4, 5, 2, 7, 3, 8], dtype=int32)
In [853]: A[idx]
Out[853]: array([0, 0, 1, 1, 4, 4, 5, 6, 7, 9])
In [854]: res = [A for _ in range(5)]
In [855]: res = np.stack([A for _ in range(5)])
In [856]: res
Out[856]: 
array([[0, 1, 5, 7, 4, 4, 0, 6, 9, 1],
       [0, 1, 5, 7, 4, 4, 0, 6, 9, 1],
       [0, 1, 5, 7, 4, 4, 0, 6, 9, 1],
       [0, 1, 5, 7, 4, 4, 0, 6, 9, 1],
       [0, 1, 5, 7, 4, 4, 0, 6, 9, 1]])
In [857]: res[:,idx]
Out[857]: 
array([[0, 0, 1, 1, 4, 4, 5, 6, 7, 9],
       [0, 0, 1, 1, 4, 4, 5, 6, 7, 9],
       [0, 0, 1, 1, 4, 4, 5, 6, 7, 9],
       [0, 0, 1, 1, 4, 4, 5, 6, 7, 9],
       [0, 0, 1, 1, 4, 4, 5, 6, 7, 9]])

索引整个数组的时间：

In [860]: A = np.random.randint(0,1000,100000)
In [861]: idx = np.argsort(A)
In [862]: timeit A.copy()
31.8 µs ± 394 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
In [863]: timeit A[idx]
332 µs ± 9.69 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

用相同的切片numpy切片repeadlty

1 个答案: