使用pandas 0.19.0。以下代码将重现该问题:
In [1]: import pandas as pd
import numpy as np
In [2]: df = pd.DataFrame({'c1' : list('AAABBBCCC'),
'c2' : list('abcdefghi'),
'c3' : np.random.randn(9),
'c4' : np.arange(9)})
df
Out[2]: c1 c2 c3 c4
0 A a 0.819618 0
1 A b 1.764327 1
2 A c -0.539010 2
3 B d 1.430614 3
4 B e -1.711859 4
5 B f 1.002522 5
6 C g 2.257341 6
7 C h 1.338807 7
8 C i -0.458534 8
In [3]: def myfun(s):
"""Function does practically nothing"""
req = s.values
return pd.Series({'mean' : np.mean(req),
'std' : np.std(req),
'foo' : 'bar'})
In [4]: res = df.groupby(['c1', 'c2'])['c3'].apply(myfun)
res.head(10)
Out[4]: c1 c2
A a foo bar
mean 0.819618
std 0
b foo bar
mean 1.76433
std 0
c foo bar
mean -0.53901
std 0
B d foo bar
当然,我希望如此:
Out[4]: foo mean std
c1 c2
A a bar 0.819618 0
b bar 1.76433 0
c bar -0.53901 0
B d bar 1.43061 0
当应用于Series或DataFrame的函数返回时,Pandas会自动将Series转换为DataFrame。为什么应用于组的函数的行为不同?
我正在寻找能够产生所需输出的答案。用于解释pandas.Series.apply
或pandas.DataFrame.apply
和pandas.core.groupby.GroupBy.apply
答案 0 :(得分:2)
轻松修复unstack
df = pd.DataFrame({'c1' : list('AAABBBCCC'),
'c2' : list('abcdefghi'),
'c3' : np.random.randn(9),
'c4' : np.arange(9)})
def myfun(s):
"""Function does practically nothing"""
req = s.values
return pd.Series({'mean' : np.mean(req),
'std' : np.std(req),
'foo' : 'bar'})
res = df.groupby(['c1', 'c2'])['c3'].apply(myfun)
res.unstack()