Question

stopwords是字符串列表，tokentext是字符串列表的列表。（每个列表都是一个句子，列表列表是一个文本文档）我只是试图删除tokentext中stopwords中也出现的所有字符串。

for element in tokentext:
    for word in element:
        if(word.lower() in stopwords):
             element.remove(word)

print(tokentext)

我希望有人能够指出我在列表中迭代的方式存在一些根本性的缺陷。

以下是失败的数据集： http://pastebin.com/p9ezh2nA

Answer 1

在迭代列表时更改列表将始终产生问题。尝试改为：

stopwords = ["some", "strings"]
tokentext = [ ["some", "lists"], ["of", "strings"] ]

new_tokentext = [[word for word in lst if word not in stopwords] for lst in tokentext]
# creates a new list of words, filtering out from stopwords

或使用filter：

new_tokentext = [list(filter(lambda x: x not in stopwords, lst)) for lst in tokentext]
# the call to `list` here is unnecessary in Python2

Answer 2

你可以做一些简单的事情：

for element in tokentext:
    if element in stop words:
        stopwords.remove(element)

它有点像你的，但没有额外的循环。但我不确定这是否有效，或者这是否是你想要实现的目标，但这是一个想法，我希望它有所帮助！

删除列表中列表中的项目

2 个答案: