Question

我有一个numpy数组

b = np.array([[1,2], [3,4], [1,6], [7,2], [3,9], [7,10]])

现在，我想做以下事情：我想减少b。我希望减少它的方式是我查看b的第一个元素，即[1,2]，并基于此，我删除b中包含至少1或{{{}的所有元素1}}。在这种情况下，我会删除2和[1,6]。然后我会查看[7,2]并删除至少包含[3,4]或3的元素。

实际上，我从列表的开头开始，对于每个元素，我删除包含其中一个元素的其他元素。

我的尝试

遗憾的是，这不起作用！

这是我尝试过的，但它不起作用而且太冗长了。还有其他想法吗？

修改我认为主要的问题是，当我for a in b: np.insert(b[~np.array([np.any((a==b)[j]) for j in range(len(b))])], 0,a, axis = 0)时，它只对那些第一个元素等于a的第一个元素的元素说“True”，但是当它们等于第二个元素时不会说“True”元件

编辑2 你觉得这会起作用吗？

np.any((a==b)[j])

Answer 1

一个简单的解决方案是使用普通的Python循环：

b = np.array([[1,2], [3,4], [1,6], [7,2], [3,9], [7,10]])

final = []
seen = set()
for row in b.tolist():
    if seen.intersection(row):  # check if one element of the row has already been seen.
        continue
    else:
        # No item has been seen, so append this row and add the contents to the seen set.
        seen.update(row)
        final.append(row)

print(final)
# [[1, 2], [3, 4], [7, 10]]

我不确定这种问题是否具有良好的NumPy功能，但是对于纯Python来说它应该非常快。

Answer 2

根据数据的维度，您可能希望做一些不同的事情，但总的来说，解决这个问题的好方法是通过索引。导入numpy为np

# Generate the data to work with
X = np.array([[1,2], [3,4], [1,6], [7,2], [3,9], [7,10]])

# Get the truth value is first value in the OR second value in the column
eq_idxs = np.logical_or(X == X[0, 0], X == X[0, 1])

# compress axis
eq_idxs = np.any(eq_idxs, axis=1)

#negate to get the remaining indexes
neq_idxs = np.logical_not(eq_idxs)

#Get the results
new_X = X[neq_idxs, :]
deleted_rows = X[eq_idxs, :]

print new_X

输出：

[[ 3  4]
 [ 3  9]
 [ 7 10]]

如果你想重复包裹它（X.shape [0]＆gt; 0）：

Answer 3

添加一个依赖于布尔索引和（可能太多）重构和展平的NumPy答案。

import numpy as np
b = np.array([[1,2], [3,4], [1,6], [7,2], [3,9], [7,10]])

# flatten it for comparisons
b = b.ravel()
idx = 0
while idx < len(b) // 2:
    row = b[idx:idx+2]
    mask = np.zeros(b.shape, dtype=bool)
    np.logical_or(b[idx+2:] == row[0], b[idx+2:] == row[1], out=mask[idx+2:])
    b = b.reshape(-1, 2)  # reshape so "row" masking can be applied easily
    mask = mask.reshape(-1, 2).any(-1)
    b = b[~mask].ravel()  # ravel again after masking
    idx += 1
print(b.reshape(-1, 2))
# array([[ 1,  2],
#        [ 3,  4],
#        [ 7, 10]])

使用np.isin或类似内容可能会进一步改善这一点，但我没有时间（现在）进一步改善这一点。

Answer 4

我想我找到了一种方式（几乎已经在我的编辑中），我将在这里发布它只是为了将来，我认为它类似于一些答案：

for index, a in enumerate(b):
    if index >= len(b) - 1:
        break
    else:
        b  = np.insert(b[~np.array([np.any(np.logical_or(a[0]==b, a[1]==b)[j]) for j in range(len(b))])], index, a,axis=0)

这应该有效

如果其中的任何列表包含至少一个前一行的值

4 个答案: