根据列总和过滤熊猫交叉表

时间:2021-02-04 10:19:49

标签: python pandas

我使用 Python3 和 Pandas 作为数据集,如下所示(玩具数据集)-

data
      location importance    agent  count
0       London        Low  chatbot      2
1          NYC     Medium  chatbot      1
2       London       High    human      3
3       London        Low    human      4
4          NYC       High    human      1
5          NYC     Medium  chatbot      2
6    Melbourne        Low  chatbot      3
7    Melbourne        Low    human      4
8    Melbourne       High    human      5
9          NYC       High  chatbot      5
10   Melbourne        Low    human      3
11   Melbourne        Low    human      1
12   Melbourne       High  chatbot      5
13  Washington     Medium  chatbot      7
14  Washington     Medium    human      8
15  Washington       High  chatbot      5
16   Melbourne     Medium  chatbot      4
17  Washington     Medium  chatbot      5
18   Melbourne       High    human      3
19  Washington        Low  chatbot      2

pandas 交叉表应用如下-

pd.crosstab(data['location'], data['importance'])

importance  High  Low  Medium
location                     
London         1    2       0
Melbourne      3    4       1
NYC            2    0       2
Washington     1    1       3

问题是对 3 列“高”、“低”、“中”求和,以便您只包含总和 >= 4 的交叉表行。因此,对于此示例,它应该排除伦敦,因为它是列总和 < 4.

帮助?

1 个答案:

答案 0 :(得分:1)

您可以对行值求和并按 4 进行比较并在 boolean indexing 中进行过滤:

df1 = pd.crosstab(data['location'], data['importance'])


df = df1[df1.sum(axis=1).ge(4)]

工作方式:

df = df1[df1.sum(axis=1)>= 4)]