我正在尝试按monthtly_purchases
和region
分组以获取客户数量和每月支出总和,但是,我得到以下错误:
主要数据框:
customer_id monthly_spending month monthtly_purchases region
32324 342 Feb-2019 5 A
34345 293 Feb-2019 5 A
45453 212 Feb-2019 3 A
34343 453 Feb-2019 3 A
53533 112 Feb-2019 5 B
12334 511 Feb-2019 5 B
99934 123 Feb-2019 3 B
21213 534 Feb-2019 3 B
32324 143 March-2019 5 A
34345 453 March-2019 5 A
45453 234 March-2019 3 A
34343 432 March-2019 3 A
53533 124 March-2019 5 B
12334 453 March-2019 5 B
99934 224 March-2019 3 B
21213 634 March-2019 3 B
输出数据框:
monthly_purchases region monthly_spending count_customers month
5 A 635 2 Feb-2019
3 A 665 2 Feb-2019
5 B 623 2 Feb-2019
3 B 657 2 Feb-2019
5 A 596 2 Feb-2019
3 A 666 2 Feb-2019
5 B 556 2 Feb-2019
3 B 858 2 Feb-2019
这是我到目前为止尝试过的方法,但出现以下错误:
d = {'customer_id': ['count'], 'monthly_spending': ['sum']}
agg_df = df.groupby('monthtly_purchases', 'region').agg(d)
agg_df
Error msg: No numeric types to aggregate
答案 0 :(得分:1)
当您使用2个或更多列的分组方式时,请记住将列名放在列表中:
import pandas as pd
df = pd.DataFrame([
[32324, 342, "Feb-2019", 5, "A"],
[34345, 293, "Feb-2019", 5, "A"],
[45453, 212, "Feb-2019", 3, "A"],
[34343, 453, "Feb-2019", 3, "A"],
[53533, 112, "Feb-2019", 5, "B"],
[12334, 511, "Feb-2019", 5, "B"],
[99934, 123, "Feb-2019", 3, "B"],
[21213, 534, "Feb-2019", 3, "B"]
],
columns=["customer_id", "monthly_spending", "month", "monthtly_purchases", "region"]
)
d = {'customer_id': ['count'], 'monthly_spending': ['sum']}
agg_df = df.groupby(["monthtly_purchases", "region"]).agg(d)
print(agg_df)
返回:
customer_id monthly_spending
count sum
monthtly_purchases region
3 A 2 665
B 2 657
5 A 2 635
B 2 623
按照注释中的要求,使多索引显式(通过创建新索引将其拆分为列):
agg_df.reset_index(inplace=True)
print(agg_df)
返回:
monthtly_purchases region customer_id monthly_spending
count sum
0 3 A 2 665
1 3 B 2 657
2 5 A 2 635
3 5 B 2 623
包括评论中要求的月份:
agg_df = df.groupby(["month", "monthtly_purchases", "region"], as_index=False).agg(d)
返回:
month monthtly_purchases region customer_id monthly_spending
count sum
0 Feb-2019 3 A 2 665
1 Feb-2019 3 B 2 657
2 Feb-2019 5 A 2 635
3 Feb-2019 5 B 2 623
4 March-2019 3 A 2 666
5 March-2019 3 B 2 858
6 March-2019 5 A 2 596
7 March-2019 5 B 2 577
答案 1 :(得分:0)
列的顺序不同,但是您可以使用以下代码来获取它。
df = df.groupby(['monthtly_purchases','region','month']).agg({'customer_id': 'size', 'monthly_spending': 'sum'}).reset_index()
df
monthtly_purchases region month customer_id monthly_spending
0 3 A Feb-2019 2 665
1 3 B Feb-2019 2 657
2 5 A Feb-2019 2 635
3 5 B Feb-2019 2 623