Question

我有一个包含三列的DataFrame：

id     order     ordernumber  
1      app         1
1      pip         2
1      org         3
2      app         1
3      app         1
3      org         3

“订单”列只有3个唯一值（app，pip和org）。我想获得一个DataFrame，它为每个id显示每种ID的订单数量，以及订单总数。

结果如下：

id     app        pip    org    total
1      1           1      1      3
2      1           0      0      1
3      1           0      1      2

Answer 1

您可以使用dict.weCannotFeliverToPoBoxes来获取计数：

pivot_table

然后，您可以通过对每行求和来添加“总计”列：

>>> df2 = df.pivot_table(index='id', columns='order', aggfunc='size', fill_value=0)
>>> df2
order  app  org  pip
id
1        1    1    1
2        1    0    0
3        1    1    0

Answer 2

替代ajcr：

df2 = df.pivot_table(index='id', columns='order', aggfunc=lambda x: len(x.unique()), margins=True)

使用不同的aggfunc来计算唯一身份。

In [4]: df2 = df.pivot_table(index='id', columns='order', aggfunc=lambda x: len(x.unique()), margins=True)

In [5]: df2
Out[5]:
      ordernum
order      app org pip All
id
1            1   1   1   3
2            1 NaN NaN   1
3            1   1 NaN   2
All          1   1   1   3

此外，您可以使用margins param自动获取pivot_table函数的列/行小计。

如果您之后需要替换NaN，您可以使用： df2.fillna(0, inplace=True)

In [6]: df2.fillna(0, inplace=True)

In [7]: df2
Out[7]:
      ordernum
order      app org pip All
id
1            1   1   1   3
2            1   0   0   1
3            1   1   0   2
All          1   1   1   3

熊猫：如何在给定列中进行分组并获取唯一数据？

2 个答案: