Question

我有两个特征矩阵，它们的行数不同。假设矩阵A具有比矩阵B更多的行。矩阵列包括ID1，ID2，Time_slice，特征值。由于对于某些Time_slice，B中没有特征值，B中的行数小于A.我需要查找B中遗漏哪些行。然后将行添加到B，其中包含相关的ID1，ID2值和零。特征

            ID1  ID2 Time_slice Feature
A= array([[ 100,  1.,   0.,     1.5],
          [ 100,  1.,   1.,     3.7],
          [ 100,  2.,   0.,     1.2],
         [ 100,   2.,   1.,     1.8],
         [ 100,   2.,   2.,     2.9],
         [ 101,   3.,   0.,     1.5],
          [ 101,  3.,   1.,     3.7],
          [ 101,  4.,   0.,     1.2],
         [ 101,   4.,   1.,     1.8],
         [ 101,   4.,   2.,     2.9]])

B= array([[ 100,  1.,   0.,     1.25],
          [ 100,  1.,   1.,     3.37],
          [ 100,  2.,   0.,     1.42],
         [ 100,   2.,   1.,     1.68]])

Output should be as follow:

         [[ 100,  1.,   0.,     1.25],
          [ 100,  1.,   1.,     3.37],
          [ 100,  2.,   0.,     1.42],
         [ 100,   2.,   1.,     1.68],
         [ 100,   2.,   2.,     0  ],
         [ 101,   3.,   0.,     0],
          [ 101,  3.,   1.,     0],
          [ 101,  4.,   0.,     0],
         [ 101,   4.,   1.,     0],
         [ 101,   4.,   2.,     0]])

Answer 1

（从所需的输出）出现A中的行被认为与行匹配如果前三列相等，则在B中。如果我们能够确定哪些A行匹配B行，那么您的问题将在很大程度上得到解决。

如果识别匹配仅依赖于单个列中的值，那么我们可以使用np.in1d。例如，如果[0, 1, 2, 5 ,0]中的A值和[0, 2]中的值是B中的值，那么

In [39]: np.in1d([0, 1, 2, 5, 0], [0, 2])
Out[39]: array([ True, False,  True, False,  True], dtype=bool)

显示A的哪些行符合B的行。

在NumPy中（目前）没有这个函数的高维概括。

然而，有一个技巧可用于查看 2D数组的多列作为字节值的单列 - 从而转换2D数组成一维数组。然后，我们可以将np.in1d应用于此1D阵列。我learned from Jaime的技巧封装在函数asvoid：

中

import numpy as np

def asvoid(arr):
    """
    View the array as dtype np.void (bytes).

    This views the last axis of ND-arrays as np.void (bytes) so
    comparisons can be performed on the entire row.
    https://stackoverflow.com/a/16840350/190597 (Jaime, 2013-05)

    Some caveats:
        - `asvoid` will work for integer dtypes, but be careful if using asvoid on float
        dtypes, since float zeros may compare UNEQUALLY:
        >>> asvoid([-0.]) == asvoid([0.])
        array([False], dtype=bool)

        - `asvoid` works best on contiguous arrays. If the input is not contiguous,
        `asvoid` will copy the array to make it contiguous, which will slow down the
        performance.

    """
    arr = np.ascontiguousarray(arr)
    return arr.view(np.dtype((np.void, arr.dtype.itemsize * arr.shape[-1])))

A = np.array([[ 100,  1.,   0.,     1.5],
              [ 100,  1.,   1.,     3.7],
              [ 100,  2.,   0.,     1.2],
              [ 100,   2.,   1.,     1.8],
              [ 100,   2.,   2.,     2.9],
              [ 101,   3.,   0.,     1.5],
              [ 101,  3.,   1.,     3.7],
              [ 101,  4.,   0.,     1.2],
              [ 101,   4.,   1.,     1.8],
              [ 101,   4.,   2.,     2.9]])

B = np.array([[ 100,  1.,   0.,     1.25],
              [ 100,  1.,   1.,     3.37],
              [ 100,  2.,   0.,     1.42],
              [ 100,   2.,   1.,     1.68]])

mask = np.in1d(asvoid(A[:, :3]), asvoid(B[:, :3]))
result = A[~mask]
result[:, -1] = 0
result = np.row_stack([B, result])
print(result)

产量

[[ 100.      1.      0.      1.25]
 [ 100.      1.      1.      3.37]
 [ 100.      2.      0.      1.42]
 [ 100.      2.      1.      1.68]
 [ 100.      2.      2.      0.  ]
 [ 101.      3.      0.      0.  ]
 [ 101.      3.      1.      0.  ]
 [ 101.      4.      0.      0.  ]
 [ 101.      4.      1.      0.  ]
 [ 101.      4.      2.      0.  ]]

Answer 2

您可以尝试以下内容：

import numpy as np

A = np.array([[ 100,  1.,   0.,     1.5],
          [ 100,  1.,   1.,     3.7],
          [ 100,  2.,   0.,     1.2],
          [ 100,  2.,   1.,     1.8],
          [ 100,  2.,   2.,     2.9],
          [ 101,  3.,   0.,     1.5],
          [ 101,  3.,   1.,     3.7],
          [ 101,  4.,   0.,     1.2],
          [ 101,  4.,   1.,     1.8],
          [ 101,  4.,   2.,     2.9]])

B = np.array([[ 100,  1.,   0.,     1.25],
          [ 100,  1.,   1.,     3.37],
          [ 100,  2.,   0.,     1.42],
          [ 100,  2.,   1.,     1.68]])

listB = B.tolist()

for rowA in A:
    if rowA.tolist not in listB:
        B = np.append(B, [[rowA[0], rowA[1], rowA[2], 0]], axis=0)

print B

比较Python中的两个矩阵，并填充错过的行

2 个答案: