Question

我正在尝试查找组之间的z值得分，因此例如在以下数据中

df:

GROUP VALUE
 1     5
 2     2
 1     10
 2     20
 1     7

第1组中的值为5、10、7。因此，现在我仅在其组中寻找其zscore

Sample Desired Output: 

GROUP VALUE Z_SCORE
 1     5     0.5
 2     2     0.01
 1     10    7
 2     20    8.3
 1     7     1.3

上面的

zscore并不是真正的计算值，只是一种表示形式。

我正在尝试以下

def z_score(x):
   z = np.abs(stats.zscore(x))
   return z

df['Z_SCORE'] = df.groupby(['GROUP'])['Value'].apply(z_score)

，但无法成功完成。我该如何实现？

Answer 1

使用GroupBy.transform代替apply将每组正确的numpy数组正确转换为新的Series：

from  scipy.stats import zscore

def z_score(x):
   z = np.abs(zscore(x))
   return z

df['Z_SCORE'] = df.groupby('GROUP')['VALUE'].transform(z_score)

print (df)
   GROUP  VALUE   Z_SCORE
0      1      5  1.135550
1      2      2  1.000000
2      1     10  1.297771
3      2     20  1.000000
4      1      7  0.162221

使用GroupBy.apply的解决方案是可能的，但是对于返回Series以及每个组的索引是必需的更改函数：

def z_score(x):
   z = np.abs(zscore(x))
   return pd.Series(z, index=x.index)


df['Z_SCORE'] = df.groupby('GROUP')['VALUE'].apply(z_score)
print (df)
   GROUP  VALUE   Z_SCORE
0      1      5  1.135550
1      2      2  1.000000
2      1     10  1.297771
3      2     20  1.000000
4      1      7  0.162221

熊猫：zscore在组中

1 个答案: