基于记录从pandas DataFrame中提取行

时间:2017-08-29 16:29:05

标签: python pandas dataframe indexing

假设我有一个如下数据框:

In [42]: df
Out[42]: 
      regiment company      name  preTestScore  postTestScore
0   Nighthawks     1st    Miller             4             25
1   Nighthawks     1st  Jacobson            24             94
2   Nighthawks     2nd       Ali            31             57
3   Nighthawks     2nd    Milner             2             62
4     Dragoons     1st     Cooze             3             70
5     Dragoons     1st     Jacon             4             25
6     Dragoons     2nd    Ryaner            24             94
7     Dragoons     2nd      Sone            31             57
8       Scouts     1st     Sloan             2             62
9       Scouts     1st     Piger             3             70
10      Scouts     2nd     Riani             2             62
11      Scouts     2nd       Ali             3             70

所以我做的是:

我按如下方式列出了元组:

In [48]: s = [('Nighthawks', '1st', 'Miller'), ('Scouts', '2nd', 'Ali')]

当我做In [40]: df.loc[s]

我得到了一个KeyError

我只是想做随意的事情,并被困在这里。为什么我不能根据元组中包含的信息提取行?

1 个答案:

答案 0 :(得分:1)

关键错误是因为loc期望索引作为第一个参数。你传递了整个记录......?这不会起作用。

这有效:

print(df.loc[:4])
     regiment company      name  preTestScore  postTestScore
0  Nighthawks     1st    Miller             4             25
1  Nighthawks     1st  Jacobson            24             94
2  Nighthawks     2nd       Ali            31             57
3  Nighthawks     2nd    Milner             2             62
4    Dragoons     1st     Cooze             3             70

这不是:

print(df.loc[s[:4]])
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<ipython-input-624-7f654aad4cfd> in <module>()
----> 1 df.loc[s[:4]]

请注意,如果您尝试按位置索引检索行,则最好使用df.iloc

解决您的评论,您应解压缩并使用df.isin

x, y, z = zip(*[('Nighthawks', '1st', 'Miller'), ('Dragoons', '2nd', 'Cooze')])
out = df[df.regiment.isin(x) & df.company.isin(y) & df.name.isin(z)]
print(out)
     regiment company    name  preTestScore  postTestScore
0  Nighthawks     1st  Miller             4             25
4    Dragoons     1st   Cooze             3             70

并且,使用否定~操作的反转:

out = df[~(df.regiment.isin(x) & df.company.isin(y) & df.name.isin(z))]
print(out)
      regiment company      name  preTestScore  postTestScore
1   Nighthawks     1st  Jacobson            24             94
2   Nighthawks     2nd       Ali            31             57
3   Nighthawks     2nd    Milner             2             62
5     Dragoons     1st     Jacon             4             25
6     Dragoons     2nd    Ryaner            24             94
7     Dragoons     2nd      Sone            31             57
8       Scouts     1st     Sloan             2             62
9       Scouts     1st     Piger             3             70
10      Scouts     2nd     Riani             2             62
11      Scouts     2nd       Ali             3             70