从pandas dataframe中选择包含特定值的行

时间:2016-07-04 13:14:32

标签: python pandas

我有一个pandas数据帧,其条目都是字符串:

   A     B      C
1 apple  banana pear
2 pear   pear   apple
3 banana pear   pear
4 apple  apple  pear

等。我想选择包含某个字符串的所有行,比如'banana'。我不知道每次都会出现哪一列。当然,我可以写一个for循环并迭代所有行。但有更简单或更快的方法吗?

4 个答案:

答案 0 :(得分:4)

使用NumPy,它可以被矢量化以搜索任意数量的字符串,就像这样 -

def select_rows(df,search_strings):
    unq,IDs = np.unique(df,return_inverse=True)
    unqIDs = np.searchsorted(unq,search_strings)
    return df[((IDs.reshape(df.shape) == unqIDs[:,None,None]).any(-1)).all(0)]

示例运行 -

In [393]: df
Out[393]: 
        A       B      C
0   apple  banana   pear
1    pear    pear  apple
2  banana    pear   pear
3   apple   apple   pear

In [394]: select_rows(df,['apple','banana'])
Out[394]: 
       A       B     C
0  apple  banana  pear

In [395]: select_rows(df,['apple','pear'])
Out[395]: 
       A       B      C
0  apple  banana   pear
1   pear    pear  apple
3  apple   apple   pear

In [396]: select_rows(df,['apple','banana','pear'])
Out[396]: 
       A       B     C
0  apple  banana  pear

答案 1 :(得分:4)

对于单个搜索值

df[df.values  == "banana"]

 df[df.isin(['banana'])]

对于多个搜索字词:

  df[(df.values  == "banana")|(df.values  == "apple" ) ]

df[df.isin(['banana', "apple"])]

  #         A       B      C
  #  1   apple  banana    NaN
  #  2     NaN     NaN  apple
  #  3  banana     NaN    NaN
  #  4   apple   apple    NaN

来自Divakar:返回两行。

select_rows(df,['apple','banana'])

 #         A       B     C
 #   0  apple  banana  pear

答案 2 :(得分:3)

您可以通过将整个df与字符串进行比较来创建布尔掩码,并调用dropna传递参数how='all'来删除字符串未出现在所有列中的行:

In [59]:
df[df == 'banana'].dropna(how='all')

Out[59]:
        A       B    C
1     NaN  banana  NaN
3  banana     NaN  NaN

要测试多个值,您可以使用多个蒙版:

In [90]:
banana = df[(df=='banana')].dropna(how='all')
banana

Out[90]:
        A       B    C
1     NaN  banana  NaN
3  banana     NaN  NaN

In [91]:    
apple = df[(df=='apple')].dropna(how='all')
apple

Out[91]:
       A      B      C
1  apple    NaN    NaN
2    NaN    NaN  apple
4  apple  apple    NaN

您可以使用index.intersection仅索引公共索引值:

In [93]:
df.loc[apple.index.intersection(banana.index)]

Out[93]:
       A       B     C
1  apple  banana  pear

答案 3 :(得分:0)

如果您希望df的所有行包含values中的任何个值,请使用:

df[df.isin(values).any(1)]

示例:

In [2]: df                                                                                                                       
Out[2]: 
   0  1  2
0  7  4  9
1  8  2  7
2  1  9  7
3  3  8  5
4  5  1  1

In [3]: df[df.isin({1, 9, 123}).any(1)]                                                                                          
Out[3]: 
   0  1  2
0  7  4  9
2  1  9  7
4  5  1  1