我有一个交易数据和客户的社会群体的数据框:
print(df.sample(10))
Shop Transaction_value Social Group
7 KFC 7 Rich
22 Burger King 342 Rich
19 Burger King 6 Rich
5 KFC 2 Poor
14 McDonalds 245 Rich
2 KFC 3 Poor
16 McDonalds 56 Poor
6 KFC 6 Poor
20 Burger King 23 Poor
8 KFC 5 Poor
我做了一个分组依据,告诉我每个商店最常见的社交群体:
(df.groupby(['Shop', 'Social Group'])['Transaction_value'].count())
Shop Social Group
Burger King Poor 7
Rich 3
KFC Poor 6
Rich 3
McDonalds Poor 3
Rich 6
我想将上面的数字除以每个社会群体的value_counts()
:
df['Social Group'].value_counts()
Poor 16
Rich 12
因此,在我的第一个分组依据中,无论我们有Poor
的任何地方,我都要除以16。无论我们拥有Rich
的任何地方,我都希望除以12。
所以我将有一个这样的数据框:
Shop Social Group
Burger King Poor 0.43
Rich 0.25
KFC Poor 0.37
Rich 0.37
McDonalds Poor 0.25
Rich 0.5
我为此尝试了div()
。我以为该索引将与每个数据帧匹配,但是它不起作用:
(df.groupby(['Shop', 'Social Group'])['Transaction_value']
.count()
.div(df['Social Group'].value_counts()))
ValueError: cannot join with no overlapping index names
使用内置的pandas函数甚至可以做到吗?
我想我可以通过for循环来做到这一点-但这会花费很多时间。
我的df:
df.to_dict()
{'Shop': {0: 'KFC',
1: 'KFC',
2: 'KFC',
3: 'KFC',
4: 'KFC',
5: 'KFC',
6: 'KFC',
7: 'KFC',
8: 'KFC',
9: 'McDonalds',
10: 'McDonalds',
11: 'McDonalds',
12: 'McDonalds',
13: 'McDonalds',
14: 'McDonalds',
15: 'McDonalds',
16: 'McDonalds',
17: 'McDonalds',
18: 'Burger King',
19: 'Burger King',
20: 'Burger King',
21: 'Burger King',
22: 'Burger King',
23: 'Burger King',
24: 'Burger King',
25: 'Burger King',
26: 'Burger King',
27: 'Burger King'},
'Transaction_value': {0: 1,
1: 2,
2: 3,
3: 34,
4: 2,
5: 2,
6: 6,
7: 7,
8: 5,
9: 4,
10: 3,
11: 2,
12: 12,
13: 31,
14: 245,
15: 123,
16: 56,
17: 67,
18: 68,
19: 6,
20: 23,
21: 44,
22: 342,
23: 234,
24: 3,
25: 234,
26: 666,
27: 88},
'Social Group': {0: 'Poor',
1: 'Rich',
2: 'Poor',
3: 'Poor',
4: 'Rich',
5: 'Poor',
6: 'Poor',
7: 'Rich',
8: 'Poor',
9: 'Rich',
10: 'Rich',
11: 'Rich',
12: 'Rich',
13: 'Rich',
14: 'Rich',
15: 'Poor',
16: 'Poor',
17: 'Poor',
18: 'Poor',
19: 'Rich',
20: 'Poor',
21: 'Poor',
22: 'Rich',
23: 'Poor',
24: 'Poor',
25: 'Rich',
26: 'Poor',
27: 'Poor'}}
答案 0 :(得分:7)
您很亲密,需要level=1
来匹配MultiIndex
的第二级:
s = df['Social Group'].value_counts()
s1 = df.groupby(['Shop', 'Social Group'])['Transaction_value'].count().div(s, level=1)
print (s1)
Shop Social Group
Burger King Poor 0.4375
Rich 0.2500
KFC Poor 0.3750
Rich 0.2500
McDonalds Poor 0.1875
Rich 0.5000
dtype: float64