Question

libOfSentences = ["Get help with the display",
                 "Display is not working properly", "I need some help"]
#removing stopwords

for i in libOfSentences:
     sentence = word_tokenize(j) #tokenize each individual word
     sentence = filter(lambda x: x not in string.punctuation, sentence) 
     cleaned_text = filter(lambda x: x not in stop_words, sentence) 

     removedStopwordsList = " ".join(cleaned_text)

removedStopwordsList现在将句子重新组合在一起，但是我希望将其保留在列表中。所需的输出是这样的：

["Get help display", "Display not working properly", "I need some help"]

我想让removedStopwordsList仍然是我可以循环浏览的列表

removedStopwordsList[0]

给我

"G D I"

现在，但我想要removedStopwordsList[0]

输出

"Get help display"

join函数可以阻止这种情况的发生，但是我找不到更好的解决方法。

Answer 1

我想删除StopwordsList仍然是列表

然后仅将其作为列表而不是将其作为字符串：

removedStopwordsList = list(cleaned_text)

尽管您可以通过使用列表理解而不是调用filter来更简单地做到这一点：

removedStopwordsList = [x for x in sentence if x not in stop_words]

map和filter很棒，当您有一个要在每个元素上调用的函数时，但是当您有一个任意表达式时，必须将其包装在lambda中变成函数调用，只使用列表推导或生成器表达式就更简单易读。

您可以类似地简化上一行。所以：

for i in libOfSentences:
    sentence = word_tokenize(j) #tokenize each individual word
    sentence = (x for x in sentence if x not in string.punctuation)
    removedStopwordsList = [x for x in sentence if x not in stop_words]

如果还需要连接字符串，那就很好了；您可以使用第二个变量：

removedStopwordsString = " ".join(removedStopwordsList)

如果您真的想要一个可以同时执行两种操作的对象，那么编写这样的类就不会很困难，但是这很丑陋。而且在幕后，它将只拥有一个self.list_of_words和self.joined_string委托给它。那么，有什么意义呢？

无论如何，我怀疑您是否需要保留字符串。如果您想将其打印出来，可以随时join对其进行打印：

print(" ".join(removedStopwordsList))

…甚至将其扩展为单独的可打印内容：

print(*removeStopwordsList)

如果您尝试将所有这些列表收集到一个大列表中，则必须实际编写代码来做到这一点。显然，如果您在循环中每次都执行removeStopwordsList = <anything>，则每次都将其替换。如果要保留所有这些列表，则需要append到更大的列表。例如：

listOfLists = []
for i in libOfSentences:
    sentence = word_tokenize(j) #tokenize each individual word
    sentence = (x for x in sentence if x not in string.punctuation)
    removedStopwordsList = [x for x in sentence if x not in stop_words]
    listOfLists.append(removedStopwordsList)

现在，如果您打印出listOfLists，它将是两个单词列表的列表； listOfLists[0]将是第一个列表； listOfLists[0][0]将成为第一个列表的第一个单词；等

如何在Python列表中将单词反令牌化回原始形式

1 个答案: