Groupby应用多个参数

时间:2018-07-12 08:06:47

标签: python pandas scipy apply pandas-groupby

我想根据它们所属的组来计算百分等级。我编写了以下代码,并能够进行计算,例如zscore,因为只有一个输入。有两个参数的函数该怎么办?谢谢。

import pandas as pd
import scipy.stats as stats
import numpy as np

funZScore = lambda x: (x - x.mean()) / x.std()
funPercentile = lambda x, y: stats.percentileofscore(x[~np.isnan(x)], y)

A = pd.DataFrame({'Group' : ['A','A','A','A','B','B','B'], 
                  'Value' : [4, 7, None, 6, 2, 8, 1]})

# Compute the Z-score by group
A['Z'] = A.groupby('Group')['Value'].apply(funZScore)

print(A)
Group  Value         Z
0     A    4.0 -1.091089
1     A    7.0  0.872872
2     A    NaN       NaN
3     A    6.0  0.218218
4     B    2.0 -0.440225
5     B    8.0  1.144586
6     B    1.0 -0.704361

# compute the percentile rank by group
# how to put two arguments into groupby apply? 
# I hope to get something like below
Group  Value         Z    P
0     A    4.0 -1.091089    33.33
1     A    7.0  0.872872   100 
2     A    NaN       NaN   NaN
3     A    6.0  0.218218   66.67
4     B    2.0 -0.440225   66.67
5     B    8.0  1.144586   100
6     B    1.0 -0.704361   33.33

1 个答案:

答案 0 :(得分:2)

我认为需要:

d = A.groupby('Group')['Value'].apply(list).to_dict()
print (d)
{'A': [4.0, 7.0, nan, 6.0], 'B': [2.0, 8.0, 1.0]}


A['P'] = A.apply(lambda x: funPercentile(np.array(d[x['Group']]), x['Value']), axis=1)
print (A)
  Group  Value         Z           P
0     A    4.0 -1.091089   33.333333
1     A    7.0  0.872872  100.000000
2     C    NaN       NaN         NaN
3     A    6.0  0.218218   66.666667
4     B    2.0 -0.440225   66.666667
5     B    8.0  1.144586  100.000000
6     B    1.0 -0.704361   33.333333