我有一个交易数据框,其中将客户分为富人和穷人。
我想看看去每个商店的人主要是富人还是穷人。
print(df.sample(10))
Shop Transaction_value Social Group
7 KFC 7 Rich
22 Burger King 342 Rich
19 Burger King 6 Rich
5 KFC 2 Poor
14 McDonalds 245 Rich
2 KFC 3 Poor
16 McDonalds 56 Poor
6 KFC 6 Poor
20 Burger King 23 Poor
8 KFC 5 Poor
我已经做到了大部分:
df.groupby(['Shop', 'Social Group'])["Transaction_value"].count()
Shop Social Group
Burger King Poor 7
Rich 3
KFC Poor 6
Rich 3
McDonalds Poor 3
Rich 6
我可以看到汉堡王吸引了大多数穷人。麦当劳吸引了大多数有钱人。
但是我该如何提取这些信息呢?即最常去每个商店的社交团体。
我正在尝试获得这样的结果:
Shop Social Group
Burger King Poor
KFC Poor
McDonalds Rich
我在这里还检查了其他一些使用idxmax()
的问题,但无法正常工作:
df.groupby(['Shop', 'Social Group'])["Transaction_value"].count().idxmax()
('Burger King', 'Poor')
我也没有成功使用max()
:
df.groupby(['Shop', 'Social Group'])["Transaction_value"].count().max(level=0)
Shop
Burger King 7
KFC 6
McDonalds 6
有什么建议吗?
我的df:
df.to_dict()
{'Shop': {0: 'KFC',
1: 'KFC',
2: 'KFC',
3: 'KFC',
4: 'KFC',
5: 'KFC',
6: 'KFC',
7: 'KFC',
8: 'KFC',
9: 'McDonalds',
10: 'McDonalds',
11: 'McDonalds',
12: 'McDonalds',
13: 'McDonalds',
14: 'McDonalds',
15: 'McDonalds',
16: 'McDonalds',
17: 'McDonalds',
18: 'Burger King',
19: 'Burger King',
20: 'Burger King',
21: 'Burger King',
22: 'Burger King',
23: 'Burger King',
24: 'Burger King',
25: 'Burger King',
26: 'Burger King',
27: 'Burger King'},
'Transaction_value': {0: 1,
1: 2,
2: 3,
3: 34,
4: 2,
5: 2,
6: 6,
7: 7,
8: 5,
9: 4,
10: 3,
11: 2,
12: 12,
13: 31,
14: 245,
15: 123,
16: 56,
17: 67,
18: 68,
19: 6,
20: 23,
21: 44,
22: 342,
23: 234,
24: 3,
25: 234,
26: 666,
27: 88},
'Social Group': {0: 'Poor',
1: 'Rich',
2: 'Poor',
3: 'Poor',
4: 'Rich',
5: 'Poor',
6: 'Poor',
7: 'Rich',
8: 'Poor',
9: 'Rich',
10: 'Rich',
11: 'Rich',
12: 'Rich',
13: 'Rich',
14: 'Rich',
15: 'Poor',
16: 'Poor',
17: 'Poor',
18: 'Poor',
19: 'Rich',
20: 'Poor',
21: 'Poor',
22: 'Rich',
23: 'Poor',
24: 'Poor',
25: 'Rich',
26: 'Poor',
27: 'Poor'}}
答案 0 :(得分:2)
在您的解决方案中,可以用DataFrameGroupBy.idxmax
通过第一级将另一个groupby
链接起来,获取元组列表(因为MultiIndex
),因此可以通过用str[1]
进行索引来选择第二个值:
df1 = (df.groupby(['Shop', 'Social Group'])["Transaction_value"]
.count()
.groupby(level=0)
.idxmax()
.str[1]
.reset_index(name='Social Group'))
print (df1)
Shop Social Group
0 Burger King Poor
1 KFC Poor
2 McDonalds Rich
另一个想法是使用Series.value_counts
,默认情况下进行排序,因此选择了第一个索引值:
df1 = (df.groupby('Shop')["Social Group"]
.agg(lambda x: x.value_counts().index[0])
.reset_index(name='Social Group'))
print (df1)
Shop Social Group
0 Burger King Poor
1 KFC Poor
2 McDonalds Rich
或使用Series.mode
解决方案,并通过Series.iat
选择第一个值:
df1 = (df.groupby('Shop')["Social Group"]
.agg(lambda x: x.mode().iat[0])
.reset_index(name='Social Group'))
print (df1)
Shop Social Group
0 Burger King Poor
1 KFC Poor
2 McDonalds Rich