根据ID值删除/保留numpy数组行

时间:2018-07-25 16:26:27

标签: python arrays python-3.x numpy

我有两个numpy数组,每个数组在第0列中都有一个标识号。

每个阵列的标识号都匹配的地方,我希望保持与这些ID号相关联的相应行。

在另一个数组中没有匹配ID的ID的情况下,我希望删除与该ID号关联的行,仅删除该ID号出现在该数组中的行。

两个数组均按其ID号排序。

可以在下面找到输入数组a和b和输出数组c和d的示例-请注意,数组的行数不相同(nb的a和b的实际示例要大得多-( 2487,12)&(2482,12))

在:

a =
[[9.60977,  97.5,  96,    99,    100.5,  1.60]
 [9.60978,  97.5,  96,    100.5, 102,    0.31]
 [9.60979,  97.5,  96,    102,   103.5,  0.11]
 [9.60980,  97.5,  96,    103.5, 105,    0.05]
 [9.60981,  97.5,  96,    105,   106.5,  0.03]
 [9.60983,  97.5,  96,    108,   109.5,  0.01]
 [9.60984,  97.5,  96,    109.5, 111,    0.01]]

b = 
[[9.60977,  99,    100.5, 97.5,  96,     1.58]
 [9.60979,  102,   103.5, 97.5,  96,     0.11]
 [9.60980,  103.5, 105,   97.5,  96,     0.05] 
 [9.60981,  105,   106.5, 97.5,  96,     0.03]
 [9.60982,  106.5, 108,   97.5,  96,     0.02]
 [9.60984,  109.5, 111,   97.5,  96,     0.01]]

出局:

c =
[[9.60977,  97.5,  96,    99,    100.5,  1.60]
 [9.60979,  97.5,  96,    102,   103.5,  0.11]
 [9.60980,  97.5,  96,    103.5, 105,    0.05]
 [9.60981,  97.5,  96,    105,   106.5,  0.03]
 [9.60984,  97.5,  96,    109.5, 111,    0.01]]

d = 
[[9.60977,  99,    100.5, 97.5,  96,     1.58]
 [9.60979,  102,   103.5, 97.5,  96,     0.11]
 [9.60980,  103.5, 105,   97.5,  96,     0.05] 
 [9.60981,  105,   106.5, 97.5,  96,     0.03]
 [9.60984,  109.5, 111,   97.5,  96,     0.01]]

我曾尝试使用一对if语句坐在for循环中,但是由于1)数组的长度不同(请参见下面的Traceback)和2)它不会重新测试行而使它落入了值已被删除

for i in np.arange(0, max(len(a), len(b)), 1):
    if a[i, 0] > b[i, 0]:
        a = np.delete(a, i, 0)
    if a[i, 0] < b[i, 0]:
        b = np.delete(b, i, 0)

Traceback (most recent call last):

  File "<ipython-input-271-509fc93aea3b>", line 2, in <module>
    if a[i, 0] > b[i, 0]:

IndexError: index 4 is out of bounds for axis 0 with size 3

我也尝试了while循环,但是它删除了数组b中所有错误的行

n = 0
s = max(len(a), len(b))
c = np.array(())
d = np.array(())
while n != s:
    if a[n, 0] == b[n, 0]:
        c = np.append(c, a[n, :])
        d = np.append(d, b[n, :])
        n = n+1
    elif a[n, 0] > b[n, 0]:
        a = np.delete(a, n, 0)
    elif a[n, 0] < b[n, 0]:
        b = np.delete(b, n, 0)
Traceback (most recent call last):

  File "<ipython-input-285-f7c600c498cb>", line 6, in <module>
    if a[n, 0] == b[n, 0]:

IndexError: index 1 is out of bounds for axis 0 with size 1

我还有其他更合理的方法可以使用ID号删除和添加行吗?

1 个答案:

答案 0 :(得分:2)

您可以使用np.isin查找每个数组中第一列的值出现在另一个数组的第一列值中的哪个位置。然后,这只是简单的索引编制问题。

c = a[np.isin(a[:,0],b[:,0])]

d = b[np.isin(b[:,0],a[:,0])]

>>> c
array([[  9.60977000e+00,   9.75000000e+01,   9.60000000e+01,
          9.90000000e+01,   1.00500000e+02,   1.60000000e+00],
       [  9.60979000e+00,   9.75000000e+01,   9.60000000e+01,
          1.02000000e+02,   1.03500000e+02,   1.10000000e-01],
       [  9.60980000e+00,   9.75000000e+01,   9.60000000e+01,
          1.03500000e+02,   1.05000000e+02,   5.00000000e-02],
       [  9.60981000e+00,   9.75000000e+01,   9.60000000e+01,
          1.05000000e+02,   1.06500000e+02,   3.00000000e-02],
       [  9.60984000e+00,   9.75000000e+01,   9.60000000e+01,
          1.09500000e+02,   1.11000000e+02,   1.00000000e-02]])
>>> d
array([[  9.60977000e+00,   9.90000000e+01,   1.00500000e+02,
          9.75000000e+01,   9.60000000e+01,   1.58000000e+00],
       [  9.60979000e+00,   1.02000000e+02,   1.03500000e+02,
          9.75000000e+01,   9.60000000e+01,   1.10000000e-01],
       [  9.60980000e+00,   1.03500000e+02,   1.05000000e+02,
          9.75000000e+01,   9.60000000e+01,   5.00000000e-02],
       [  9.60981000e+00,   1.05000000e+02,   1.06500000e+02,
          9.75000000e+01,   9.60000000e+01,   3.00000000e-02],
       [  9.60984000e+00,   1.09500000e+02,   1.11000000e+02,
          9.75000000e+01,   9.60000000e+01,   1.00000000e-02]])

说明

 >>> np.isin(a[:,0],b[:,0])
array([ True, False,  True,  True,  True, False,  True], dtype=bool)

上面的内容基本上只是向您显示a的第一列的值在b的第一列中的位置,然后可以通过该布尔数组索引a ,使用上面显示的代码:

c = a[np.isin(a[:,0],b[:,0])]