Question

我有一堆数据框，我想找到包含我指定的两个词的数据框。例如，我要查找包含单词hello和world的所有数据框。 A和B有资格，C没有。

我尝试过： df[(df[column].str.contains('hello')) & (df[column].str.contains('world'))]仅拾取B，而df[(df[column].str.contains('hello')) | (df[column].str.contains('world'))]拾取全部三个。

我需要只选择A和B的东西

A =

    Name    Data   
0   Mike    hello    
1   Mike    world    
2   Mike    hello   
3   Fred    world
4   Fred    hello
5   Ted     world

B =

    Name    Data   
0   Mike    helloworld
1   Mike    world    
2   Mike    hello   
3   Fred    world
4   Fred    hello
5   Ted     world

C =

    Name    Data   
0   Mike    hello
1   Mike    hello    
2   Mike    hello   
3   Fred    hello
4   Fred    hello
5   Ted     hello

Answer 1

如果要在一个列中的任意位置找到'hello'而在某列中的任意位置找到'world'，则需要一个布尔值：

df.Data.str.contains('hello').any() & df.Data.str.contains('world').any()

如果您有单词列表，并且需要检查整个DataFrame，请尝试：

import numpy as np

lst = ['hello', 'world']
np.logical_and.reduce([any(word in x for x in df.values.ravel()) for word in lst])

样本数据

print(df)
   Name   Data   Data2
0  Mike  hello  orange
1  Mike  world  banana
2  Mike  hello  banana
3  Fred  world  apples
4  Fred  hello   mango
5   Ted  world    pear

lst = ['apple', 'hello', 'world']
np.logical_and.reduce([any(word in x for x in df.values.ravel()) for word in lst])
#True

lst = ['apple', 'hello', 'world', 'bear']
np.logical_and.reduce([any(word in x for x in df.values.ravel()) for word in lst])
# False

Answer 2

使用

import re 

bool(re.search(r'^(?=.*hello)(?=.*world)', df.sum().sum())
Out[461]: True

Answer 3

如果hello和world是数据中的独立字符串，则df.eq（）应该可以完成工作，并且您不需要str.contains。它不是字符串方法，并且可以在整个数据帧上使用。

(((df == 'hello').any()) & ((df == 'world').any())).any()

True

熊猫针对一个细胞而不是整个列按一个以上的“包含”进行过滤

3 个答案:

样本数据