我想根据它们所属的组来计算百分等级。我编写了以下代码,并能够进行计算,例如zscore,因为只有一个输入。有两个参数的函数该怎么办?谢谢。
import pandas as pd
import scipy.stats as stats
import numpy as np
funZScore = lambda x: (x - x.mean()) / x.std()
funPercentile = lambda x, y: stats.percentileofscore(x[~np.isnan(x)], y)
A = pd.DataFrame({'Group' : ['A','A','A','A','B','B','B'],
'Value' : [4, 7, None, 6, 2, 8, 1]})
# Compute the Z-score by group
A['Z'] = A.groupby('Group')['Value'].apply(funZScore)
print(A)
Group Value Z
0 A 4.0 -1.091089
1 A 7.0 0.872872
2 A NaN NaN
3 A 6.0 0.218218
4 B 2.0 -0.440225
5 B 8.0 1.144586
6 B 1.0 -0.704361
# compute the percentile rank by group
# how to put two arguments into groupby apply?
# I hope to get something like below
Group Value Z P
0 A 4.0 -1.091089 33.33
1 A 7.0 0.872872 100
2 A NaN NaN NaN
3 A 6.0 0.218218 66.67
4 B 2.0 -0.440225 66.67
5 B 8.0 1.144586 100
6 B 1.0 -0.704361 33.33
答案 0 :(得分:2)
我认为需要:
d = A.groupby('Group')['Value'].apply(list).to_dict()
print (d)
{'A': [4.0, 7.0, nan, 6.0], 'B': [2.0, 8.0, 1.0]}
A['P'] = A.apply(lambda x: funPercentile(np.array(d[x['Group']]), x['Value']), axis=1)
print (A)
Group Value Z P
0 A 4.0 -1.091089 33.333333
1 A 7.0 0.872872 100.000000
2 C NaN NaN NaN
3 A 6.0 0.218218 66.666667
4 B 2.0 -0.440225 66.666667
5 B 8.0 1.144586 100.000000
6 B 1.0 -0.704361 33.333333