groupby和agg by多列错误

时间:2020-07-13 07:59:53

标签: python pandas

我正在尝试按monthtly_purchasesregion分组以获取客户数量和每月支出总和,但是,我得到以下错误:

主要数据框:

customer_id   monthly_spending      month             monthtly_purchases       region     
32324         342                   Feb-2019          5                        A
34345         293                   Feb-2019          5                        A
45453         212                   Feb-2019          3                        A
34343         453                   Feb-2019          3                        A
53533         112                   Feb-2019          5                        B
12334         511                   Feb-2019          5                        B
99934         123                   Feb-2019          3                        B
21213         534                   Feb-2019          3                        B
32324         143                   March-2019        5                        A
34345         453                   March-2019        5                        A
45453         234                   March-2019        3                        A
34343         432                   March-2019        3                        A
53533         124                   March-2019        5                        B
12334         453                   March-2019        5                        B
99934         224                   March-2019        3                        B
21213         634                   March-2019        3                        B

输出数据框:

monthly_purchases region    monthly_spending    count_customers         month
5                 A         635               2                       Feb-2019
3                 A         665               2                       Feb-2019
5                 B         623               2                       Feb-2019
3                 B         657               2                       Feb-2019

5                 A         596               2                       Feb-2019
3                 A         666               2                       Feb-2019
5                 B         556               2                       Feb-2019
3                 B         858               2                       Feb-2019

这是我到目前为止尝试过的方法,但出现以下错误:

d = {'customer_id': ['count'], 'monthly_spending': ['sum']}

agg_df = df.groupby('monthtly_purchases', 'region').agg(d)
agg_df

Error msg: No numeric types to aggregate

2 个答案:

答案 0 :(得分:1)

当您使用2个或更多列的分组方式时,请记住将列名放在列表中:

import pandas as pd

df = pd.DataFrame([
[32324, 342, "Feb-2019", 5, "A"],
[34345, 293, "Feb-2019", 5, "A"],
[45453, 212, "Feb-2019", 3, "A"],
[34343, 453, "Feb-2019", 3, "A"],
[53533, 112, "Feb-2019", 5, "B"],
[12334, 511, "Feb-2019", 5, "B"],
[99934, 123, "Feb-2019", 3, "B"],
[21213, 534, "Feb-2019", 3, "B"]
],
columns=["customer_id", "monthly_spending", "month", "monthtly_purchases", "region"]
)

d = {'customer_id': ['count'], 'monthly_spending': ['sum']}
agg_df = df.groupby(["monthtly_purchases", "region"]).agg(d)
print(agg_df)

返回:

                          customer_id monthly_spending
                                count              sum
monthtly_purchases region                             
3                  A                2              665
                   B                2              657
5                  A                2              635
                   B                2              623

按照注释中的要求,使多索引显式(通过创建新索引将其拆分为列):

agg_df.reset_index(inplace=True)
print(agg_df)

返回:

  monthtly_purchases region customer_id monthly_spending
                                  count              sum
0                  3      A           2              665
1                  3      B           2              657
2                  5      A           2              635
3                  5      B           2              623

包括评论中要求的月份:

agg_df = df.groupby(["month", "monthtly_purchases", "region"], as_index=False).agg(d)

返回:

        month monthtly_purchases region customer_id monthly_spending
                                              count              sum
0    Feb-2019                  3      A           2              665
1    Feb-2019                  3      B           2              657
2    Feb-2019                  5      A           2              635
3    Feb-2019                  5      B           2              623
4  March-2019                  3      A           2              666
5  March-2019                  3      B           2              858
6  March-2019                  5      A           2              596
7  March-2019                  5      B           2              577

答案 1 :(得分:0)

列的顺序不同,但是您可以使用以下代码来获取它。

df = df.groupby(['monthtly_purchases','region','month']).agg({'customer_id': 'size', 'monthly_spending': 'sum'}).reset_index()
df
    monthtly_purchases  region  month   customer_id monthly_spending
0   3   A   Feb-2019    2   665
1   3   B   Feb-2019    2   657
2   5   A   Feb-2019    2   635
3   5   B   Feb-2019    2   623