我刚刚将pandas从0.17.1更新到0.18.1,并认为我在更改一些预先存在的代码时发现了下面概述的新重采样方法的问题。根据此文档,我的下面示例中的df3_resample和df4_resample应返回相同的数据帧,但df4_resample会引发异常。这让我绊倒了一段时间,所以我想我会分享。
Exception: Column(s) A already selected
http://pandas.pydata.org/pandas-docs/version/0.18.0/whatsnew.html#whatsnew-0180-breaking-resample
df = pd.DataFrame(np.random.rand(10,4),
columns=list('ABCD'),
index=pd.date_range('2010-01-01 09:00:00', periods=10, freq='s'))
df['item'] = 'item_a' # add column for groupby
# THIS WORKS
df1_resample = df.groupby('item').resample('2s').agg({'A': np.mean, 'B': np.max}).reset_index()
print df1_resample
# THIS WORKS
df2_resample = df.resample('2s').agg({'A': {'A_mean': np.mean, 'A_max': np.max}}).reset_index()
print df2_resample
# THIS WORKS
df3_resample = df.groupby('item').apply(lambda x: x.resample('2s').agg({'A': {'A_mean': np.mean, 'A_max': np.max}})).reset_index()
print df3_resample
# THIS DOESN"T WORKS
df4_resample = df.groupby('item').resample('2s').agg({'A': {'A_mean': np.mean, 'A_max': np.max}})
print df4_resample
输出:
item level_1 A B
0 item_a 2010-01-01 09:00:00 0.611660 0.739640
1 item_a 2010-01-01 09:00:02 0.615876 0.880113
2 item_a 2010-01-01 09:00:04 0.218292 0.441504
3 item_a 2010-01-01 09:00:06 0.753698 0.637787
4 item_a 2010-01-01 09:00:08 0.471272 0.474738
index A
A_mean A_max
0 2010-01-01 09:00:00 0.611660 0.813038
1 2010-01-01 09:00:02 0.615876 0.994657
2 2010-01-01 09:00:04 0.218292 0.233478
3 2010-01-01 09:00:06 0.753698 0.848107
4 2010-01-01 09:00:08 0.471272 0.610592
item level_1 A
A_mean A_max
0 item_a 2010-01-01 09:00:00 0.611660 0.813038
1 item_a 2010-01-01 09:00:02 0.615876 0.994657
2 item_a 2010-01-01 09:00:04 0.218292 0.233478
3 item_a 2010-01-01 09:00:06 0.753698 0.848107
4 item_a 2010-01-01 09:00:08 0.471272 0.610592
File "<some_file.py>", line 29, in <module>
df4_resample = df.groupby('item').resample('2s').agg({'A': {'A_mean': np.mean, 'A_max': np.max}})
File "C:\Anaconda2\lib\site-packages\pandas\tseries\resample.py", line 293, in aggregate
result, how = self._aggregate(arg, *args, **kwargs)
File "C:\Anaconda2\lib\site-packages\pandas\core\base.py", line 505, in _aggregate
result = list(_agg(arg, _agg_1dim).values())
File "C:\Anaconda2\lib\site-packages\pandas\core\base.py", line 496, in _agg
result[fname] = func(fname, agg_how)
File "C:\Anaconda2\lib\site-packages\pandas\core\base.py", line 479, in _agg_1dim
return colg.aggregate(how, _level=(_level or 0) + 1)
File "C:\Anaconda2\lib\site-packages\pandas\tseries\resample.py", line 293, in aggregate
result, how = self._aggregate(arg, *args, **kwargs)
File "C:\Anaconda2\lib\site-packages\pandas\core\base.py", line 528, in _aggregate
result = _agg(arg, lambda fname,
File "C:\Anaconda2\lib\site-packages\pandas\core\base.py", line 496, in _agg
result[fname] = func(fname, agg_how)
File "C:\Anaconda2\lib\site-packages\pandas\core\base.py", line 529, in <lambda>
agg_how: _agg_1dim(self._selection, agg_how))
File "C:\Anaconda2\lib\site-packages\pandas\core\base.py", line 475, in _agg_1dim
colg = self._gotitem(name, ndim=1, subset=subset)
File "C:\Anaconda2\lib\site-packages\pandas\core\base.py", line 680, in _gotitem
groupby=self._groupby[key],
File "C:\Anaconda2\lib\site-packages\pandas\core\base.py", line 326, in __getitem__
raise Exception('Column(s) %s already selected' % self._selection)
Exception: Column(s) A already selected
答案 0 :(得分:0)
我不确定为什么resample
不起作用,但有一个方便的解决方法,不需要使用lambda。试一试:
df.groupby([
'item', pd.Grouper(freq = '2s')
]).agg({
'A' : ['mean', 'max']
}).rename(columns = {
'mean' : 'A_mean', 'max' : 'A_max'
}, level = 1).reset_index()
您可以将.resample('2S')
添加到pd.Grouper('2s')
,而不是使用groupby()
。它的功能与您的情况相同。这是文档 - &gt; http://pandas.pydata.org/pandas-docs/version/0.18/generated/pandas.Grouper.html
另一方面,您应该避免使用嵌套字典重命名列(不推荐使用它),而是使用实际的.rename()
函数。