Question

df =

index="paloalto"| table user | rex field=user "(?P<user_name>((?:abcd\([A-Za-z0-9-]+|\w+)))"

我需要一个新列，每个列相对于该列的百分得分。最终答案应如下所示。我希望将stats.percentileofscore（）函数的输出输入到pcntle_rank列中。我考虑过以某种方式使用Apply，但是如何将所需的函数参数传递给percentileofscore？

df =

这是我的尝试。我想做到这一点而无需循环。实际数据有50列和4000行。我将需要为每一列和每一行执行此操作。

value    pcntle_rank
1        stats.percentileofscore(df['value'], df['value'][1])
5        stats.percentileofscore(df['value'], df['value'][2]) 
34       stats.percentileofscore(df['value'], df['value'][3]) 
5        stats.percentileofscore(df['value'], df['value'][4]) 
67       stats.percentileofscore(df['value'], df['value'][5])
8        stats.percentileofscore(df['value'], df['value'][6])
98       stats.percentileofscore(df['value'], df['value'][7])

我的循环给出结果，但是我想不带for循环来做。

Answer 1

`Series.rank`

使用pct=True，这相当于stats.percentileofscore与默认kind='rank'

df[0].rank(pct=True)*100
#0     14.285714
#1     35.714286
#2     71.428571
#3     35.714286
#4     85.714286
#5     57.142857
#6    100.000000
#Name: 0, dtype: float64

from scipy import stats

for idx, val in df[0].iteritems():
    print(f'{val}: {stats.percentileofscore(df[0], score=val)}')

#1 : 14.285714285714286
#5 : 35.714285714285715
#34 : 71.42857142857143
#5 : 35.714285714285715
#67 : 85.71428571428571
#8 : 57.142857142857146
#98 : 100.0

将stats.percentileofscore应用于每一列

1 个答案:

`Series.rank`