Question

我有一个多索引数据框，可以使用以下示例创建一个示例：

arrays = [['bar', 'bar', 'bar', 'bar', 'bar','baz', 'baz','baz', 'baz', 'baz', 'foo', 'foo', 'foo', 
'foo', 'foo', 'qux', 'qux', 'qux','qux', 'qux'],
        [1,2,3,4,5,1,2,3,4,5,1,2,3,4,5,1,2,3,4,5]]
tuples = list(zip(*arrays))
values = [1,1,2,2,2,1,1,1,1,1,2,2,2,3,3,3,2,2,2,1]
df = pd.DataFrame(values, index=pd.MultiIndex.from_tuples(tuples, names=['first', 'second']), 
columns = ['test'])

产生一个看起来像这样的数据框

             test 
first   sec
bar     1   1
        2   1
        3   2
        4   2
        5   2
baz     1   1
        2   1
        3   1
        4   1
        5   1
foo     1   2
        2   2
        3   2
        4   3
        5   3
qux     1   3
        2   2
        3   2
        4   2
        5   2

我想弄清楚如何在名为['result']的新列中获得所有“第一”的“测试”中数字的累加和。我觉得我要用完了

df['result'] = df.test.expanding(1).sum()

但是我不知道如何在df ['sec'] = 5时将其切断并重新开始（它一直在运行）

我希望最终输出看起来像

             test  result
first   sec
bar     1   1      1   
        2   1      2
        3   2      4
        4   2      6
        5   2      8
baz     1   1      1
        2   1      2
        3   1      3
        4   1      4
        5   1      5
foo     1   2      2
        2   2      4
        3   2      6
        4   3      9
        5   3      12
qux     1   3      3
        2   2      5
        3   2      7
        4   2      9
        5   2      11

建议表示赞赏。

Answer 1

完成了这项工作， df['result'] = df.groupby(['first'])['test'].transform(lambda x: x.cumsum())

具有扩展窗口功能的pandas multiindex

1 个答案: