我有一个像这样的numpy数组:
a = [['I05', 'U13', 4],
['I12', 'U13', 5],
['I22', 'U13', 3],
['I03', 'U15', 5],
['I14', 'U23', 5],
['I12', 'U23', 2],
['I15', 'U43', 5]]
在这里,我们有两个U13条目和三个U23条目。因此,我需要保留这些数组并删除其余的数组。
删除后,我想要这样的结果:
a = [['I05', 'U13', 4],
['I12', 'U13', 5],
['I22', 'U13', 3],
['I14', 'U23', 5],
['I12', 'U23', 2]]
如何有效地做到这一点?
数组已在第二列('UXX'
值)上排序。
答案 0 :(得分:4)
此方法应获得所需的输出:
import numpy as np
from collections import Counter
a = np.array([['I05', 'U13', 4],
['I12', 'U13', 5],
['I22', 'U13', 3],
['I03', 'U15', 5],
['I14', 'U23', 5],
['I12', 'U23', 2],
['I15', 'U43', 5]])
# counts number of occurrences of each value in second column
d = Counter(a[:,1])
# creates an index where these counts are > 1
index_keep = [i for i, j in enumerate(a[:,1]) if d[j] > 1]
print(a[index_keep])
>>> [['I05' 'U13' '4']
['I12' 'U13' '5']
['I22' 'U13' '3']
['I14' 'U23' '5']
['I12' 'U23' '2']]
答案 1 :(得分:1)
对于混合类型,Pandas是一个方便的选择。由于您的数据已排序,因此您只需保留重复项:
import pandas as pd
import numpy as np
A = np.array([('I05', 'U13', 4),
('I12', 'U13', 5),
('I22', 'U13', 3),
('I03', 'U15', 5),
('I14', 'U23', 5),
('I12', 'U23', 2),
('I15', 'U43', 5)],
dtype='object, object, i4')
df = pd.DataFrame(A)
B = df[df.duplicated(subset=['f1'], keep=False)].values
print(B)
array([['I05', 'U13', 4],
['I12', 'U13', 5],
['I22', 'U13', 3],
['I14', 'U23', 5],
['I12', 'U23', 2]], dtype=object)
注意NumPy自动添加名称。这是一个结构化数组,而不是元组数组:
print(A)
array([('I05', 'U13', 4), ('I12', 'U13', 5), ('I22', 'U13', 3),
('I03', 'U15', 5), ('I14', 'U23', 5), ('I12', 'U23', 2),
('I15', 'U43', 5)],
dtype=[('f0', 'O'), ('f1', 'O'), ('f2', '<i4')])