我正在尝试将函数按组应用于一个列,目的是创建2个新列,其中包含每个组的函数返回值。示例如下:
def testms(x):
mu = np.sum(x)
si = np.sum(x)/2
return mu, si
df = pd.concat([pd.DataFrame({'A' : [1, 1, 1, 1, 1, 2, 2, 2, 2, 2]}), pd.DataFrame({'B' : np.random.rand(10)})],axis=1)
df
A B
0 1 0.696761
1 1 0.035178
2 1 0.468180
3 1 0.157818
4 1 0.281470
5 2 0.377689
6 2 0.336046
7 2 0.005879
8 2 0.747436
9 2 0.772405
desired_result =
A B mu si
0 1 0.696761 1.652595 0.826297
1 1 0.035178 1.652595 0.826297
2 1 0.468180 1.652595 0.826297
3 1 0.157818 1.652595 0.826297
4 1 0.281470 1.652595 0.826297
5 2 0.377689 2.997657 1.498829
6 2 0.336046 2.997657 1.498829
7 2 0.005879 2.997657 1.498829
8 2 0.747436 2.997657 1.498829
9 2 0.772405 2.997657 1.498829
我想我已经找到了解决方案,但我正在寻找更优雅,更高效的东西:
x = df.groupby('A')['B'].apply(lambda x: pd.Series(testms(x),index=['mu','si']))
A
1 mu 1.652595
si 0.826297
2 mu 2.997657
si 1.498829
Name: B, dtype: float64
df.merge(x.drop(labels='mu',level=1),on='A',how='outer').merge(x.drop(labels='si',level=1),on='A',how='outer')
答案 0 :(得分:1)
一个想法是更改函数,用于创建由mu
和si
值填充的新列,并为返回组返回x
:
def testms(x):
mu = np.sum(x['B'])
si = np.sum(x['B'])/2
x['mu'] = mu
x['si'] = si
return x
x = df.groupby('A').apply(testms)
print (x)
A B mu si
0 1 0.352297 3.590048 1.795024
1 1 0.860488 3.590048 1.795024
2 1 0.939260 3.590048 1.795024
3 1 0.988280 3.590048 1.795024
4 1 0.449723 3.590048 1.795024
5 2 0.125852 1.300524 0.650262
6 2 0.853474 1.300524 0.650262
7 2 0.000996 1.300524 0.650262
8 2 0.223886 1.300524 0.650262
9 2 0.096316 1.300524 0.650262
您应该使用Series.unstack
和DataFrame.join
简化您的解决方案:
df1 = df.groupby('A')['B'].apply(lambda x: pd.Series(testms(x),index=['mu','si'])).unstack()
x = df.join(df1, on='A')
print (x)
A B mu si
0 1 0.085961 2.791346 1.395673
1 1 0.887589 2.791346 1.395673
2 1 0.685952 2.791346 1.395673
3 1 0.946613 2.791346 1.395673
4 1 0.185231 2.791346 1.395673
5 2 0.994415 3.173444 1.586722
6 2 0.159852 3.173444 1.586722
7 2 0.773711 3.173444 1.586722
8 2 0.867337 3.173444 1.586722
9 2 0.378128 3.173444 1.586722