Question

我有一个带有句子和单词列表的熊猫系列。我想返回系列中包含列表中所有单词的所有条目

例如。

sample_list = ['dog', 'cat', 'rat']

Series
0 "I have a dog, a cat, and a rat."
1 "I only have a dog."
2 "I only have a cat."

在此示例中，仅返回第一行。

我目前正在为列表中的每个单词使用.str.contains（）函数。有没有更有效的方法来做到这一点？

感谢。

Answer 1

<强>设置

In [1413]: s
Out[1413]: 
0    I have a dog, a cat, and a rat.
1                 I only have a dog.
2                 I only have a cat.
Name: 1, dtype: object

防弹方法将涉及迭代sample_list并使用pd.DateFrame构造函数构建新的数据框。然后，您可以致电df.min获取最终的面具：

In [1426]: pd.DataFrame([s.str.contains(x) for x in sample_list]).T.min(axis=1)
Out[1426]: 
0     True
1    False
2    False
dtype: bool

在系列中应用boolean indexing：

In [1427]: idx = pd.DataFrame([s.str.contains(x) for x in sample_list]).T.min(axis=1); s[idx]
Out[1427]: 
0    I have a dog, a cat, and a rat.
Name: 1, dtype: object

如果您可以保证sample_list中的字词在这些列中以相同顺序显示，则可以str.contains使用regex=True次In [1414]: idx = s.str.contains('.*'.join(sample_list), regex=True); s[idx] Out[1414]: 0 I have a dog, a cat, and a rat. Name: 1, dtype: object次呼叫：

Android Studio

Answer 2

快速轻松地完成这项工作需要做两件事。

pd.Series.apply()

和

all()

何处：

#apply a function to each row in the series
#the function returns true iff all the words in sample_list are in the value
#we use boolean indexing to only return the True values.
x[x.apply(lambda x: all([y in x for y in sample_list]))]

返回：

0     I have a dog, a cat, and a rat.
Name: 0, dtype: object

根据需要。

返回一个pandas系列包含列表中字符串的字符串

2 个答案: