Question

我目前有 2 个 9x9 数组，我有兴趣将沿其对角线和非对角线的值转换为 9x3 数组。为了解决这个问题，我将此数组视为一个 3x3“块”数组，其中每个“块”都是一个 3x3 数组。为了获得我想要查找的值，我使用了：1) for 循环或 2) 列表推导式，如下所示：

import numpy as np
ar1 = np.arange(0,81).reshape(9,9)
ar2 = np.arange(81,162).reshape(9,9)

ar1_diag = np.zeros((9,3),dtype = float)
ar2_diag = np.zeros((9,3),dtype = float)

#Method 1: for loop
for i in range(0,3):
    ar1_diag[3*i:3*i+3,:] = ar1[3*i:3*i+3,3*i:3*i+3]
    ar2_diag[3*i:3*i+3,:] = ar2[3*i:3*i+3,3*i:3*i+3]

#Method 2: list comprehension   
ar1_diag2 = np.array([ar1[3*j:3*j+3,3*j:3*j+3] for j in range(0,3)]).reshape(9,3)
ar2_diag2 = np.array([ar2[3*j:3*j+3,3*j:3*j+3] for j in range(0,3)]).reshape(9,3)

如果我们只考虑使用 1 个数组，列表理解方法似乎比它的 for 循环方法具有边际性能速度优势，但我最终将不得不考虑许多比上面使用的数组大得多的数组。< /p>

我的问题：是否有比我所做的（通过 map、lambda 函数等）更有效/更省时的方法来切片数组，或者这些方法是否与它一样好获取数组切片？

Answer 1

不确定这是否是“最”有效的方式，但这显示了显着的改进-

import numpy as np
import timeit

ar1 = np.arange(0,81).reshape(9,9)
ar2 = np.arange(81,162).reshape(9,9)

ar1_diag = np.zeros((9,3),dtype = float)
ar2_diag = np.zeros((9,3),dtype = float)

ar1_diag2 = np.zeros((9,3),dtype = float)
ar2_diag2 = np.zeros((9,3),dtype = float)

ar1_diag3 = np.zeros((9,3),dtype = float)
ar2_diag3 = np.zeros((9,3),dtype = float)

#Method 1: for loop
def method1():
  global ar1_diag, ar2_diag
  for i in range(0,3):
      ar1_diag[3*i:3*i+3,:] = ar1[3*i:3*i+3,3*i:3*i+3]
      ar2_diag[3*i:3*i+3,:] = ar2[3*i:3*i+3,3*i:3*i+3]

print("Method 1:", timeit.timeit(method1))
# Method 1: 9.25558645000001

#Method 2: list comprehension
ls = [ar1[3*j:3*j+3,3*j:3*j+3] for j in range(0,3)]   
def method2():
  global ar1_diag2, ar2_diag2
  ar1_diag2 = np.array(ls).reshape(9,3)
  ar2_diag2 = np.array(ls).reshape(9,3)

print("Method 2:", timeit.timeit(method2))
# Method 2: 8.4278298549998

#Method 3: array indexing

mask = np.full_like(ar1, False)
for i in range(0,3):
    mask[3*i:3*i+3,3*i:3*i+3] = True
get = np.nonzero(mask)

def method3():
    global ar1_diag3, ar2_diag3
    ar1_diag3 = ar1[get].reshape((9,3))
    ar2_diag3 = ar2[get].reshape((9,3))

print("Method 3:", timeit.timeit(method3))
# Method 3: 4.541714375000083

并确保结果符合要求

(ar1_diag == ar1_diag2).all() and (ar1_diag == ar1_diag3).all()
# True

Answer 2

如果你多次重复这个，那么有一个预定义的索引来切片肯定会更有效率。在下面的代码中，变量 index 将被标识为 ar1_diag，您可以使用 np.take_along_axis 来获取 ar*_diag，但速度很慢。

# Generate 1d index
matrix_number = np.arange(0,81, dtype=np.int).reshape(9,9)
indices = np.repeat(np.arange(0,9).reshape((3,3)), 3, axis=0)
index = np.take_along_axis(matrix_number, indices, axis=1).flatten()

# Target matrix without reshape
ar1_flat = np.arange(0,81)
ar2_flat = np.arange(81,162)

# Output
ar1_diag = ar1_flat[index].reshape((9,3))
ar2_diag = ar2_flat[index].reshape((9,3))

性能

<头>

方法	%%时间
方法一：for循环	每个循环 6.87 µs ± 265 ns
方法二：列表推导	每个循环 7.57 µs ± 162 ns
方法三：np.take_along_axis	每个循环 18.7 µs ± 682 ns
方法四：以上建议	每个循环 3.15 µs ± 36.3 ns

注意：如果你选择方法三，代码如下

ar2_diag3 = np.take_along_axis(ar2, indices , axis=1)

Answer 3

您可以准备一个扁平索引列表，并在稍后的一个操作中使用间接获取值：

设置：

bSize = 3
side  = bSize * bSize
h  = np.arange(side).reshape(1, bSize,1, bSize)
v  = np.arange(side).reshape(bSize,1, bSize,1)
m  = v * side + h
flatDiag1 = m[np.identity(bSize,dtype=np.bool)].reshape(side, bSize)
flatDiag2 = m[np.identity(bSize,dtype=np.bool)[:,::-1]].reshape(side, bSize)

用法：

print(ar2.flatten()[flatDiag1])

[[ 81  82  83]
 [ 90  91  92]
 [ 99 100 101]
 [111 112 113]
 [120 121 122]
 [129 130 131]
 [141 142 143]
 [150 151 152]
 [159 160 161]]

print(ar2.flatten()[flatDiag2])

[[ 87  88  89]
 [ 96  97  98]
 [105 106 107]
 [111 112 113]
 [120 121 122]
 [129 130 131]
 [135 136 137]
 [144 145 146]
 [153 154 155]]

请注意，flatDiag1 与 ar1_diag 和 ar1_diag2 相同，因此您可以使用已有的矩阵，但将这些矩阵存储在一个中央位置，以便作为间接索引重用。

如果需要访问特定块，也可以使用m矩阵作为间接机制：

print(ar2.flatten()[m[2,2]]) # block (2,2) in block coordinates

[[141 142 143]
 [150 151 152]
 [159 160 161]]

Answer 4

没有重塑的 ar1_diag2 是：

Out[87]: 
array([[[ 0,  1,  2],
        [ 9, 10, 11],
        [18, 19, 20]],

       [[30, 31, 32],
        [39, 40, 41],
        [48, 49, 50]],

       [[60, 61, 62],
        [69, 70, 71],
        [78, 79, 80]]])

查看 ar1 测试是来自 ar1 的 3 (3,3) 个块：

In [89]: ar1
Out[89]: 
array([[ 0,  1,  2,  3,  4,  5,  6,  7,  8],
       [ 9, 10, 11, 12, 13, 14, 15, 16, 17],
       [18, 19, 20, 21, 22, 23, 24, 25, 26],
       [27, 28, 29, 30, 31, 32, 33, 34, 35],
       [36, 37, 38, 39, 40, 41, 42, 43, 44],
       [45, 46, 47, 48, 49, 50, 51, 52, 53],
       [54, 55, 56, 57, 58, 59, 60, 61, 62],
       [63, 64, 65, 66, 67, 68, 69, 70, 71],
       [72, 73, 74, 75, 76, 77, 78, 79, 80]])

我们可以使用 as_strided 将其重塑为 (3,3,3,3) 数组。（as_strided 是烤箱建议作为获取移动窗口的一种方式，尤其是重叠窗口）：

In [90]: X = np.lib.stride_tricks.as_strided(ar1,shape=(3,3,3,3),strides=(9*3*8,
    ...: 3*3*8,3*8,8))
In [91]: X
Out[91]: 
array([[[[ 0,  1,  2],
         [ 3,  4,  5],
         [ 6,  7,  8]],

        [[ 9, 10, 11],
         [12, 13, 14],
         [15, 16, 17]],
        ....

我可以从中选择所需的对角线块：

In [92]: X[np.arange(3),:,np.arange(3)]
Out[92]: 
array([[[ 0,  1,  2],
        [ 9, 10, 11],
        [18, 19, 20]],

       [[30, 31, 32],
        [39, 40, 41],
        [48, 49, 50]],

       [[60, 61, 62],
        [69, 70, 71],
        [78, 79, 80]]])

最初我尝试了 X[np.arange(3),np.arange(3)] 但那是错误的对角线。我怀疑如果我在 as_strided 中使用不同的步幅集，这会起作用。（strides=(9*3*8,3*8,3*3*8,8) 是正确的步幅。）

虽然 as_strided 创建一个 view 并且在创建大型移动窗口时很有帮助，但它并不总是最快的。这是松散的：

In [96]: timeit ar1_diag2 = np.array([ar1[3*j:3*j+3,3*j:3*j+3] for j in range(0,3)])
8.7 µs ± 24 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
In [97]: timeit X = np.lib.stride_tricks.as_strided(ar1,shape=(3,3,3,3),
     strides=(9*3*8,3*3*8,3*8,8))[np.arange(3),:,np.arange(3)]
26 µs ± 1.22 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

编辑

我刚刚意识到我第一次使用 as_strided 只是一个 reshape(3,3,3,3)。也就是说，步幅是相同的。这意味着我们可以使用

In [115]: timeit ar1.reshape(3,3,3,3)[np.arange(3),:,np.arange(3)]
8.7 µs ± 236 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

这与您的 ar1_diag2 索引时间相同。

数组切片的有效方法？

4 个答案:

性能

编辑