如何将大熊猫分成多个分组依据?

时间:2020-09-18 11:38:28

标签: python pandas

我有一个交易数据和客户的社会群体的数据框:

print(df.sample(10))


           Shop  Transaction_value Social Group
7           KFC                  7         Rich
22  Burger King                342         Rich
19  Burger King                  6         Rich
5           KFC                  2         Poor
14    McDonalds                245         Rich
2           KFC                  3         Poor
16    McDonalds                 56         Poor
6           KFC                  6         Poor
20  Burger King                 23         Poor
8           KFC                  5         Poor

我做了一个分组依据,告诉我每个商店最常见的社交群体:

(df.groupby(['Shop', 'Social Group'])['Transaction_value'].count())

Shop         Social Group
Burger King  Poor            7
             Rich            3
KFC          Poor            6
             Rich            3
McDonalds    Poor            3
             Rich            6

我想将上面的数字除以每个社会群体的value_counts()

df['Social Group'].value_counts()

Poor    16
Rich    12

因此,在我的第一个分组依据中,无论我们有Poor的任何地方,我都要除以16。无论我们拥有Rich的任何地方,我都希望除以12。

所以我将有一个这样的数据框:

Shop         Social Group
Burger King  Poor            0.43
             Rich            0.25
KFC          Poor            0.37
             Rich            0.37
McDonalds    Poor            0.25
             Rich            0.5

我为此尝试了div()。我以为该索引将与每个数据帧匹配,但是它不起作用:

(df.groupby(['Shop', 'Social Group'])['Transaction_value']
 .count()
 .div(df['Social Group'].value_counts()))

ValueError: cannot join with no overlapping index names

使用内置的pandas函数甚至可以做到吗?

我想我可以通过for循环来做到这一点-但这会花费很多时间。

我的df:

df.to_dict()

{'Shop': {0: 'KFC',
  1: 'KFC',
  2: 'KFC',
  3: 'KFC',
  4: 'KFC',
  5: 'KFC',
  6: 'KFC',
  7: 'KFC',
  8: 'KFC',
  9: 'McDonalds',
  10: 'McDonalds',
  11: 'McDonalds',
  12: 'McDonalds',
  13: 'McDonalds',
  14: 'McDonalds',
  15: 'McDonalds',
  16: 'McDonalds',
  17: 'McDonalds',
  18: 'Burger King',
  19: 'Burger King',
  20: 'Burger King',
  21: 'Burger King',
  22: 'Burger King',
  23: 'Burger King',
  24: 'Burger King',
  25: 'Burger King',
  26: 'Burger King',
  27: 'Burger King'},
 'Transaction_value': {0: 1,
  1: 2,
  2: 3,
  3: 34,
  4: 2,
  5: 2,
  6: 6,
  7: 7,
  8: 5,
  9: 4,
  10: 3,
  11: 2,
  12: 12,
  13: 31,
  14: 245,
  15: 123,
  16: 56,
  17: 67,
  18: 68,
  19: 6,
  20: 23,
  21: 44,
  22: 342,
  23: 234,
  24: 3,
  25: 234,
  26: 666,
  27: 88},
 'Social Group': {0: 'Poor',
  1: 'Rich',
  2: 'Poor',
  3: 'Poor',
  4: 'Rich',
  5: 'Poor',
  6: 'Poor',
  7: 'Rich',
  8: 'Poor',
  9: 'Rich',
  10: 'Rich',
  11: 'Rich',
  12: 'Rich',
  13: 'Rich',
  14: 'Rich',
  15: 'Poor',
  16: 'Poor',
  17: 'Poor',
  18: 'Poor',
  19: 'Rich',
  20: 'Poor',
  21: 'Poor',
  22: 'Rich',
  23: 'Poor',
  24: 'Poor',
  25: 'Rich',
  26: 'Poor',
  27: 'Poor'}}

1 个答案:

答案 0 :(得分:7)

您很亲密,需要level=1来匹配MultiIndex的第二级:

s = df['Social Group'].value_counts()
s1 = df.groupby(['Shop', 'Social Group'])['Transaction_value'].count().div(s, level=1)
print (s1)
Shop         Social Group
Burger King  Poor            0.4375
             Rich            0.2500
KFC          Poor            0.3750
             Rich            0.2500
McDonalds    Poor            0.1875
             Rich            0.5000
dtype: float64