如何在python中删除数组中的重复元素

时间:2013-10-30 16:32:11

标签: arrays python-3.x delete-row

我有一个两列数组,已根据第一列进行排序。我想根据我的规则删除一些元素:

1)将元素的值与第一列中的所有其他值进行比较。如果值与其他值的差异大于给定值(例如0.1),请将其保留在新数组中。否则,如果与其他人的差异小于该值的数据,则所有这些值都可以视为比较组,那么

2)对于这些比较组,我需要比较第二列中的元素,并仅保留组中第二列中值最小的元素。

例如:如果我的数组是

 a= [[1.2, 3], 
     [2.2, 3], 
     [2.25, 1], 
     [2.28, 3], 
     [3.2, 8], 
     [4.2, 10]]

然后我想得到的是:

  a=[[1.2, 3],  
     [2.25, 1], 
     [3.2, 8], 
     [4.2, 10]]

我删除了第二个和第四个元素。因为第一个元素2.2,2.25和2.28的差异小于0.1,但第二个元素1是其中最小的元素。

请问有人给我一些提示吗? 感谢

1 个答案:

答案 0 :(得分:0)

from numpy import *

eps = 0.1
#ASSUMING the second arrow is sorted (otherwise sort it first)
a= array(
    [[1, 1.2, 3],     
    [2, 2.2, 3], 
    [3, 2.25, 1], 
    [4, 2.28, 4],
    [5, 3.2, 8], 
    [6, 4.2, 10],
    [7, 4.21, 3], 
    [8, 4.25, 4], 
    [9, 4.28, 1],
    [10, 5.2, 10],
    ])
# expected result
# a= [[1, 1.2, 3],
#     [3, 2.25, 1],
#     [5, 3.2, 8],
#     [9, 4.28, 1],
#     [10, 5.2, 10],
#     ]

n = shape(a)[0]
b = a[:,1]

a1 = a[ (diff(b)<eps) ]
#indexes of some False that could be True.
#these indexes should be checked backwards
#and evtl. added to a1
indexes = where((diff(b)<eps)==False)[0][1:]
for index in indexes:
    if b[index] - b[index-1]<eps:
        a1 = vstack( (a1,a[index,:]) )

#sort array
a1 = a1[lexsort( (a1[:,1],a1[:,1]))]

groups = where(diff(a1[:,1])>eps)[0]
i = 0
# get min of groups
for g in groups:
    ag = a1[i:g+1,2]
    Ag = a1[i:g+1,:]
    if i == 0:
        a2 = Ag [ ag == min(ag) ]
    else:
        a2 = vstack( (a2, Ag [ ag == min(ag) ] ) )

    i = g+1
#add last group
ag = a1[g+1:,2]
Ag = a1[g+1:,:]    
a2 = vstack( (a2, Ag [ ag == min(ag) ]) )

#the elements that build no groups
result = a[ in1d(a[:,0], [ int(i) for i in a[:,0] if i not in a1[:,0] ])  ] 
# add the elements of a2, these are the minimal elements of each group
result = vstack( (result, a2) )
# sort the result (optional)
result = result[lexsort( (result[:,0], result[:,0]))]
print "final result\n", result

以下是此代码的输出

In [1]: run filter.py
final result
[[  1.     1.2    3.  ]
 [  3.     2.25   1.  ]
 [  5.     3.2    8.  ]
 [  9.     4.28   1.  ]
 [ 10.     5.2   10.  ]]