大熊猫中的数据帧过滤

时间:2016-09-24 15:10:46

标签: python pandas dataframe filtering subset

如何在数据框中过滤或子集特定组(例如,从下面的数据框中接纳的女性)? 我试图总结基于性别的入学率/拒绝率。这个数据帧很小,但是如果它的数据量要大得多,那么比如成千上万的行,那么索引单个值是不可能的呢?

      Admit  Gender Dept  Freq
0   Admitted    Male    A   512
1   Rejected    Male    A   313
2   Admitted  Female    A    89
3   Rejected  Female    A    19
4   Admitted    Male    B   353
5   Rejected    Male    B   207
6   Admitted  Female    B    17
7   Rejected  Female    B     8
8   Admitted    Male    C   120
9   Rejected    Male    C   205
10  Admitted  Female    C   202
11  Rejected  Female    C   391
12  Admitted    Male    D   138
13  Rejected    Male    D   279
14  Admitted  Female    D   131
15  Rejected  Female    D   244
16  Admitted    Male    E    53
17  Rejected    Male    E   138
18  Admitted  Female    E    94
19  Rejected  Female    E   299
20  Admitted    Male    F    22
21  Rejected    Male    F   351
22  Admitted  Female    F    24
23  Rejected  Female    F   317

1 个答案:

答案 0 :(得分:2)

要过滤数据,您可以使用非常全面的query功能。

# Test data
df = DataFrame({'Admit': ['Admitted', 'Rejected', 'Admitted', 'Rejected', 'Admitted', 'Rejected', 'Admitted'],
        'Gender': ['Male', 'Male', 'Female', 'Female', 'Male', 'Male', 'Female'],
        'Freq': [512, 313, 89, 19, 353, 207, 17],
        'Gender Dept': ['A', 'A', 'A', 'A', 'B', 'B', 'B']})

df.query('Admit == "Admitted" and Gender == "Female"')

      Admit  Freq  Gender Gender Dept
2  Admitted    89  Female           A
6  Admitted    17  Female           B

使用groupby汇总数据。

group = df.groupby(['Admit', 'Gender']).sum()
print(group)

                 Freq
Admit    Gender      
Admitted Female   106
         Male     865
Rejected Female    19
         Male     520

您只需在创建的MultiIndex上进行子集化即可过滤结果。

group.loc[('Admitted', 'Female')]

Freq    106
Name: (Admitted, Female), dtype: int64