有没有一种快速的方法来比较二维数组中的每两行?

时间:2021-01-12 11:52:12

标签: python arrays numpy optimization vectorization

所以我有一个二维数组,比如 list

list = [[x11, x12, x13, x14],
        [x21, x22, x23, x24],
       ...]

list 的一些示例是:

# numbers in list are all integers
list = [[0, 17, 6, 10],
        [0, 7, 6, 10],
        ]
list = [[6, 50, 6, 10],
        [0, 50, 6, 10],
        ]
list = [[6, 16, 6, 10],
        [6, 6, 6, 10],
        ]
list = [[0, 50, 6, 10],
        [6, 50, 6, 10],
        [6, 40, 6, 10]
        ]
list = [[0, 27, 6, 10],
        [0, 37, 6, 10],
        ]

我需要每两行迭代一次,例如 [x11, x12, x13, x14] 和 [x21, x22, x23, x24],并做一些复杂的比较:

cnt1 = cnt2 = cnt3 = cnt4 = cnt5 = 0
for i in range(0, length):
    for j in range(i + 1, length):
        if (list[i][0] + list[i][2] == list[j][0] or list[j][0] + list[j][2] == list[i][0]) and \
                list[i][1] == list[j][1]:
            cnt1 += 1
            if list[i][3] == list[j][3]:
                cnt2 += 1
            else
                cnt3 += 1
        elif (list[i][1] + list[i][3] == list[j][1] or list[j][1] + list[j][3] == list[i][1]) and \
                list[i][0] == list[j][0]:
            cnt4 += 1
            if list[i][2] == list[j][2]:
                cnt2 += 1
            else
                cnt3 += 1
        else
            cnt5 += 1
# do something with the counts
这里的

length 通常很小,但是这个嵌套循环运行了数千次,因此完成程序需要很长时间。我已经阅读了一些在 Numpy 中进行矢量化的教程,但由于逻辑有点复杂,因此无法弄清楚如何编辑代码。有没有办法优化我的代码,即使是一点点?任何帮助将不胜感激。提前致谢!

2 个答案:

答案 0 :(得分:0)

在您的 for 循环中,您将数组 [x11, x12, x13, x14] 与所有后续元素([x21, x22, x23, x24][x31, x32, x33, x34][x41, x42, x43, x44] 等)进行比较,

然后您继续将 [x21, x22, x23, x24] 与所有后续元素([x31, x32, x33, x34][x41, x42, x43, x44] 等)进行比较。

要迭代每 2 行并将它们 2 x 2 进行比较(这意味着 x1x2,然后是 x3x4),您需要这样的东西:

for i in range(0, length - 1, 2):
    j = i + 1;
    if (list[i][0] + list[i][2] == list[j][0] or list[j][0] + list[j][2] == list[i][0]) and list[i][1] == list[j][1]:
        # do something
        if list[i][3] == list[j][3]:
            # do something
        else
            # do something

请注意,您还必须解决列表数组大小为奇数的情况。

答案 1 :(得分:0)

我正在发布如何针对第一个 if 以及随后的 ifelse 条件执行此操作的解决方案。

您也可以按照类似的逻辑对其余部分执行相同的操作。

import numpy as np

arr = np.array([[0, 17, 6, 10],
       [0, 7, 6, 10],
       [6, 50, 6, 10],
       [0, 50, 6, 10],
       [6, 16, 6, 10],
       [6, 6, 6, 10],
       [0, 50, 6, 10],
       [6, 50, 6, 10],
       [6, 40, 6, 10],
       [0, 27, 6, 10],
       [0, 37, 6, 10]])

N = len(arr)

cnt1 = cnt2 = cnt3 = cnt4 = cnt5 = 0
for i in range(0, N):
    for j in range(i + 1, N):
        if (arr[i][0] + arr[i][2] == arr[j][0] or arr[j][0] + arr[j][2] == arr[i][0]) and \
                arr[i][1] == arr[j][1]:
            cnt1 += 1
            if arr[i][3] == arr[j][3]:
                cnt2 += 1
            else:
                cnt3 += 1
        elif (arr[i][1] + arr[i][3] == arr[j][1] or arr[j][1] + arr[j][3] == arr[i][1]) and \
                    arr[i][0] == arr[j][0]:
            cnt4 += 1
            if arr[i][2] == arr[j][2]:
                cnt2 += 1
            else:
                cnt3 += 1
        else:
            cnt5 += 1

# this corresponds to (arr[i][0] + arr[i][2] == arr[j][0] or arr[j][0] + arr[j][2] == arr[i][0])
cnt1_bool_c1 = ((arr[:, 0] + arr[:, 2])[:, None] == arr[:, 0][None, :])

# arr[i][1] == arr[j][1]:
cnt1_bool_c2 = arr[:, 1][:, None] == arr[:, 1][None, :]

# So that i and j are compared only if i != j
cnt1_bool_c2[np.arange(N), np.arange(N)] = False

# doing and of the two previous conditions finishing the very first if condition
cnt1_bool = np.bitwise_and(cnt1_bool_c1, cnt1_bool_c2)

# corresponds to cnt1
cnt1_n = cnt1_bool.sum()

# verified
print(cnt1 == cnt1_n)

# corresponds to arr[i][3] == arr[j][3]
cnt2_bool_c = arr[:, 3][:, None] == arr[:, 3][None, :]

# So that i and j are compared only if i != j
cnt2_bool_c[np.arange(N), np.arange(N)] = False

# correspond to the inner if, count only if these elemets share the same position as the previous elements
cnt2_n1 = np.bitwise_and(cnt1_bool, cnt2_bool_c).sum()  # corresponds to the cnt2 += 1 in the first inner condition

# correspond to the inner else, count only if these elemets do not share the same position as the previous elements
cnt3_n1 = np.bitwise_and(cnt1_bool, ~cnt2_bool_c).sum()  # corresponds to the cnt3 += 1 in the first inner else condition