如何在数据框中过滤或子集特定组(例如,从下面的数据框中接纳的女性)? 我试图总结基于性别的入学率/拒绝率。这个数据帧很小,但是如果它的数据量要大得多,那么比如成千上万的行,那么索引单个值是不可能的呢?
Admit Gender Dept Freq
0 Admitted Male A 512
1 Rejected Male A 313
2 Admitted Female A 89
3 Rejected Female A 19
4 Admitted Male B 353
5 Rejected Male B 207
6 Admitted Female B 17
7 Rejected Female B 8
8 Admitted Male C 120
9 Rejected Male C 205
10 Admitted Female C 202
11 Rejected Female C 391
12 Admitted Male D 138
13 Rejected Male D 279
14 Admitted Female D 131
15 Rejected Female D 244
16 Admitted Male E 53
17 Rejected Male E 138
18 Admitted Female E 94
19 Rejected Female E 299
20 Admitted Male F 22
21 Rejected Male F 351
22 Admitted Female F 24
23 Rejected Female F 317
答案 0 :(得分:2)
要过滤数据,您可以使用非常全面的query
功能。
# Test data
df = DataFrame({'Admit': ['Admitted', 'Rejected', 'Admitted', 'Rejected', 'Admitted', 'Rejected', 'Admitted'],
'Gender': ['Male', 'Male', 'Female', 'Female', 'Male', 'Male', 'Female'],
'Freq': [512, 313, 89, 19, 353, 207, 17],
'Gender Dept': ['A', 'A', 'A', 'A', 'B', 'B', 'B']})
df.query('Admit == "Admitted" and Gender == "Female"')
Admit Freq Gender Gender Dept
2 Admitted 89 Female A
6 Admitted 17 Female B
使用groupby
汇总数据。
group = df.groupby(['Admit', 'Gender']).sum()
print(group)
Freq
Admit Gender
Admitted Female 106
Male 865
Rejected Female 19
Male 520
您只需在创建的MultiIndex
上进行子集化即可过滤结果。
group.loc[('Admitted', 'Female')]
Freq 106
Name: (Admitted, Female), dtype: int64