确定前k个值并根据各自的排名顺序对其进行标记

时间:2018-03-16 23:12:52

标签: python pandas numpy scipy

例如,存在一维数组,如下所示。是否有任何函数可以将此数组转换为另一个数组,该数组仅保留现有数组的前5个元素。这五个保留的元素根据各自的数值标记为5, 4,3,2,1,其他元素仅标记为0

9.00E-05
8.74E-05
-6.67E-05
-0.000296984
-0.00016961
-7.49E-06
-0.000102942
-0.000183901
0.000206149
5.62E-05
0.000112588
5.93E-05
9.85E-05
-2.29E-05
5.08E-05
0.00015748

3 个答案:

答案 0 :(得分:1)

以下是rank

的一个解决方案
s=df.rank(ascending=False)
s.mask(s>5,0).astype(int)
Out[74]: 
0     5
1     0
2     0
3     0
4     0
5     0
6     0
7     0
8     1
9     0
10    3
11    0
12    4
13    0
14    0
15    2
Name: val, dtype: int32

答案 1 :(得分:1)

如果您希望数字保持相同的顺序并获得具有原始数字和排名的元组数组,您可以这样做:

numbers = [ 9.00E-05, 8.74E-05, -6.67E-05, -0.000296984, -0.00016961, -7.49E-06, -0.000102942, -0.000183901, 0.000206149, 5.62E-05, 0.000112588, 5.93E-05, 9.85E-05, -2.29E-05, 5.08E-05, 0.00015748]
ranks   = { n:max(5-i,0) for (i,n) in enumerate(sorted(numbers)) }
tagged  = [ (n,ranks[n]) for n in numbers ]

# tagged will contain : [(9e-05, 0), (8.74e-05, 0), (-6.67e-05, 1), (-0.000296984, 5), (-0.00016961, 3), (-7.49e-06, 0), (-0.000102942, 2), (-0.000183901, 4), (0.000206149, 0), (5.62e-05, 0), (0.000112588, 0), (5.93e-05, 0), (9.85e-05, 0), (-2.29e-05, 0), (5.08e-05, 0), (0.00015748, 0)]

如果原始订单无关紧要,您只需要:

tagged   = [ (n,max(5-i,0)) for (i,n) in enumerate(sorted(numbers)) ]

# then tagge will be : [(-0.000296984, 5), (-0.000183901, 4), (-0.00016961, 3), (-0.000102942, 2), (-6.67e-05, 1), (-2.29e-05, 0), (-7.49e-06, 0), (5.08e-05, 0), (5.62e-05, 0), (5.93e-05, 0), (8.74e-05, 0), (9e-05, 0), (9.85e-05, 0), (0.000112588, 0), (0.00015748, 0), (0.000206149, 0)]

答案 2 :(得分:0)

一种方法是使用numpy。我们假设您的数组保存在变量arr

args = arr.argsort()
arr[args[-5:]] = range(5, 0, -1)
arr[args[:-5]] = 0

# array([ 5.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  1.,  0.,  3.,  0.,  4.,
#         0.,  0.,  2.])