Question

问题

通过聚合在pandas group的输出中包括所有可能的值或值组合。

示例

示例pandas DataFrame有三列，User，Code和Subtotal：

import pandas as pd
example_df = pd.DataFrame([['a', 1, 1], ['a', 2, 1], ['b', 1, 1], ['b', 2, 1], ['c', 1, 1], ['c', 1, 1]], columns=['User', 'Code', 'Subtotal'])

我希望对User和Code进行分组，并为User和Code的每个组合获取一个小计。

print(example_df.groupby(['User', 'Code']).Subtotal.sum().reset_index())

我得到的输出是：

  User   Code   Subtotal
0    a      1          1
1    a      2          1
2    b      1          1
3    b      2          1
4    c      1          2

如何在表格中包含缺少的组合User=='c'和Code==2，即使它在example_df中不存在？

首选输出

以下是首选输出，User=='c'和Code==2组合的零线。

  User   Code   Subtotal
0    a      1          1
1    a      2          1
2    b      1          1
3    b      2          1
4    c      1          2
5    c      2          0

Answer 1

您可以unstack使用stack：

print(example_df.groupby(['User', 'Code']).Subtotal.sum()
                .unstack(fill_value=0)
                .stack()
                .reset_index(name='Subtotal'))
  User  Code  Subtotal
0    a     1         1
1    a     2         1
2    b     1         1
3    b     2         1
4    c     1         2
5    c     2         0

MultiIndex df = example_df.groupby(['User', 'Code']).Subtotal.sum() mux = pd.MultiIndex.from_product(df.index.levels, names=['User','Code']) print (mux) MultiIndex(levels=[['a', 'b', 'c'], [1, 2]], labels=[[0, 0, 1, 1, 2, 2], [0, 1, 0, 1, 0, 1]], names=['User', 'Code']) print (df.reindex(mux, fill_value=0).reset_index(name='Subtotal')) User Code Subtotal 0 a 1 1 1 a 2 1 2 b 1 1 3 b 2 1 4 c 1 2 5 c 2 0创建reindex的另一个解决方案：

通过聚合

1 个答案: