我使用 Python3 和 Pandas 作为数据集,如下所示(玩具数据集)-
data
location importance agent count
0 London Low chatbot 2
1 NYC Medium chatbot 1
2 London High human 3
3 London Low human 4
4 NYC High human 1
5 NYC Medium chatbot 2
6 Melbourne Low chatbot 3
7 Melbourne Low human 4
8 Melbourne High human 5
9 NYC High chatbot 5
10 Melbourne Low human 3
11 Melbourne Low human 1
12 Melbourne High chatbot 5
13 Washington Medium chatbot 7
14 Washington Medium human 8
15 Washington High chatbot 5
16 Melbourne Medium chatbot 4
17 Washington Medium chatbot 5
18 Melbourne High human 3
19 Washington Low chatbot 2
pandas 交叉表应用如下-
pd.crosstab(data['location'], data['importance'])
importance High Low Medium
location
London 1 2 0
Melbourne 3 4 1
NYC 2 0 2
Washington 1 1 3
问题是对 3 列“高”、“低”、“中”求和,以便您只包含总和 >= 4 的交叉表行。因此,对于此示例,它应该排除伦敦,因为它是列总和 < 4.
帮助?
答案 0 :(得分:1)
您可以对行值求和并按 4
进行比较并在 boolean indexing
中进行过滤:
df1 = pd.crosstab(data['location'], data['importance'])
df = df1[df1.sum(axis=1).ge(4)]
工作方式:
df = df1[df1.sum(axis=1)>= 4)]