根据行中的值从Numpy数组获取唯一行

时间:2019-07-24 07:59:33

标签: python numpy unique

我有一个Numpy数组,并希望根据该数组每一行中第一个元素的值从中输出唯一行。在获取唯一行而不是完整行的第一个值方面,我可以取得部分成功,例如

dataA = np.array([(107.,  7.475729,  6.573791, 90.0126 , 0.5529882, 0.867588 ),
 (107.,  7.408565,  6.38974 , 89.97312, 0.553728 , 0.8670179),
 (108.,  7.838725,  6.961871, 89.52572, 0.5610707, 0.7769735),
 (108.,  7.795123,  7.054095, 89.62989, 0.5592708, 0.7742778),
 (109.,  7.079929,  6.86194 , 89.6181 , 0.5660294, 0.8596874),
 (109.,  7.058383,  6.671512, 89.52995, 0.5663874, 0.8610857)])


print('Original Array :' , dataA)

# Get unique values from complete 2D array
uniqueValues = np.unique(dataA)

print('Unique Values : ', uniqueValues)

# Get unique rows from  numpy array
uniqueRows = np.unique(dataA[:,0], axis=0)

print('Unique Rows : ', uniqueRows, sep='\n')

这给出了:

Unique Rows : 
[107. 108. 109.]

desired results:
[(107.,  7.475729,  6.573791, 90.0126 , 0.5529882, 0.867588 ),
 (108.,  7.838725,  6.961871, 89.52572, 0.5610707, 0.7769735),
 (109.,  7.079929,  6.86194 , 89.6181 , 0.5660294, 0.8596874)])

即使上面的方法可以使我获得行ID,但当我使用nan时似乎失败了

dataA = np.array([(107.,  7.475729,  6.573791, 90.0126 , 0.5529882, 0.867588 , nan, nan)
 (107.,  7.408565,  6.38974 , 89.97312, 0.553728 , 0.8670179, nan, nan)
 (108.,  7.838725,  6.961871, 89.52572, 0.5610707, 0.7769735, nan, nan)
 (108.,  7.795123,  7.054095, 89.62989, 0.5592708, 0.7742778, nan, nan)
 (109.,  7.079929,  6.86194 , 89.6181 , 0.5660294, 0.8596874, nan, nan)
 (109.,  7.058383,  6.671512, 89.52995, 0.5663874, 0.8610857, nan, nan)
 (110.,  7.727924,  7.116364, 90.45003, 0.5366358, 0.8887361, nan, nan)
 (110.,  7.748454,  7.223625, 90.6782 , 0.5349852, 0.8855141, nan, nan)])

1 个答案:

答案 0 :(得分:1)

您可以检查行中第一个值与下一行中的第一个值是否相等,并根据结果进行索引:

dataA[dataA[:, 0] == np.roll(dataA, -1, axis=0)[:, 0]]

array([[107.       ,   7.475729 ,   6.573791 ,  90.0126   ,   0.5529882,
          0.867588 ],
       [108.       ,   7.838725 ,   6.961871 ,  89.52572  ,   0.5610707,
          0.7769735],
       [109.       ,   7.079929 ,   6.86194  ,  89.6181   ,   0.5660294,
          0.8596874]])

如果未根据第一个值对行进行排序,请使用:

s = dataA[:,0].argsort()
dataA[s][dataA[s, 0] == np.roll(dataA, -1, axis=0)[s, 0]]

对于第二个示例,它产生:

array([[107.       ,   7.475729 ,   6.573791 ,  90.0126   ,   0.5529882,
          0.867588 ,         nan,         nan],
       [108.       ,   7.838725 ,   6.961871 ,  89.52572  ,   0.5610707,
          0.7769735,         nan,         nan],
       [109.       ,   7.079929 ,   6.86194  ,  89.6181   ,   0.5660294,
          0.8596874,         nan,         nan],
       [110.       ,   7.727924 ,   7.116364 ,  90.45003  ,   0.5366358,
          0.8887361,         nan,         nan]])