Question

我有一个包含两列foo的数据框，其中包含一个文本字符串，bar包含一个搜索字词字符串。对于我的数据框中的每一行，我想检查搜索字词是否在带有字边界的文本字符串中。

例如

import pandas as pd import numpy as np import re df = pd.DataFrame({'foo':["the dog is blue", "the cat isn't orange"], 'bar':['dog', 'cat is']}) df bar foo 0 dog the dog is blue 1 cat is the cat isn't orange

基本上我想要对以下操作进行矢量化

re.search(r"\bdog\b", "the dog is blue") is not None # True re.search(r"\bcat is\b", "the cat isn't orange") is not None # False

考虑到我正在使用几十万行，快速方法是做什么的？我尝试使用str.contains方法，但无法完全获得它。

Answer 1

您可以将功能应用于每一行：

df.apply(lambda x: re.search(r'\b' + x.bar + r'\b', x.foo) is not None, axis=1)

结果：

0     True
1    False
dtype: bool

Answer 2

df.apply(lambda x: re.search(r'\b{0}\b'.format(x.bar), x.foo) is not None, axis='columns')

df.apply将泛型函数应用于pandas行或列点击此处：http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.apply.html

Pandas DataFrame - 检查A列中的字符串是否包含B列中的完整字符串

2 个答案: