Question

我的列表如下：

lst = ['for Sam', 'Just in', 'Mark Rich']

我正在尝试从包含stopwords的字符串列表中删除一个元素（字符串包含一个或多个单词）。

由于列表中的第一和第二元素包含for的{{1}}和in，因此它将返回

stopwords

我尝试过的事情

new_lst = ['Mark Rich']

哪个给我的输出为：

from nltk.corpus import stopwords

stop_words = set(stopwords.words('english'))

lst = ['for Sam', 'Just in', 'Mark Rich']
new_lst = [i.split(" ") for i in lst]
new_lst = [" ".join(i) for i in new_lst for j in i if j not in stop_words]

Answer 1

您需要一个if语句，而不是额外的嵌套：

new_lst = [' '.join(i) for i in new_lst if not any(j in i for j in stop_words)]

如果您想使用set，可以使用set.isdisjoint：

new_lst = [' '.join(i) for i in new_lst if stop_words.isdisjoint(i)]

这是一个示范：

stop_words = {'for', 'in'}

lst = ['for Sam', 'Just in', 'Mark Rich']
new_lst = [i.split() for i in lst]
new_lst = [' '.join(i) for i in new_lst if stop_words.isdisjoint(i)]

print(new_lst)

# ['Mark Rich']

Answer 2

您可以使用列表理解，并使用dig @ns1.google.com TXT o-o.myaddr.l.google.com +short检查两个列表中的任何单词是否相交：

sets

如果字符串中包含停用词，则将其删除

2 个答案: