Question

设置

import pandas as pd

df = pd.DataFrame({'grp': [1, 2] * 2, 'value': range(4)},
                  index=pd.Index(pd.date_range('2016-03-01', periods=7)[::2], name='Date')
                 ).sort_values('grp')

我希望按'grp'进行分组，然后每天重新采样我的索引，向前填充缺失值。我希望这可行：

print df.groupby('grp').resample('D').ffill()

            grp  value
Date                  
2016-03-01    1      0
2016-03-05    1      2
2016-03-03    2      1
2016-03-07    2      3

没有。所以我尝试了这个：

print df.groupby('grp', group_keys=False).apply(lambda df: df.resample('D').ffill())

            grp  value
Date                  
2016-03-01    1      0
2016-03-02    1      0
2016-03-03    1      0
2016-03-04    1      0
2016-03-05    1      2
2016-03-03    2      1
2016-03-04    2      1
2016-03-05    2      1
2016-03-06    2      1
2016-03-07    2      3

确实有效。难道这两种方法不能产生相同的输出吗？我错过了什么？

回应ayhan的评论

print sys.version
print pd.__version__

2.7.11 |Anaconda custom (x86_64)| (default, Dec  6 2015, 18:57:58) 
[GCC 4.2.1 (Apple Inc. build 5577)]
0.18.0

ayhan表示结果在python 3，pandas 18.1

上看起来相同

将pandas更新为18.1后

2.7.11 |Anaconda custom (x86_64)| (default, Dec  6 2015, 18:57:58) 
[GCC 4.2.1 (Apple Inc. build 5577)]
0.18.1

问题已经解决。

Answer 1

由于版本0.18.0中one of the issues的更改，它看起来像resample API。

它在0.18.1中的预期效果如下：

df.groupby('grp').resample('D').ffill()
Out[2]: 
                grp  value
grp Date                  
1   2016-03-01    1      0
    2016-03-02    1      0
    2016-03-03    1      0
    2016-03-04    1      0
    2016-03-05    1      2
2   2016-03-03    2      1
    2016-03-04    2      1
    2016-03-05    2      1
    2016-03-06    2      1
    2016-03-07    2      3

pandas groupby重新取样意外结果

设置

1 个答案: