libOfSentences = ["Get help with the display",
"Display is not working properly", "I need some help"]
#removing stopwords
for i in libOfSentences:
sentence = word_tokenize(j) #tokenize each individual word
sentence = filter(lambda x: x not in string.punctuation, sentence)
cleaned_text = filter(lambda x: x not in stop_words, sentence)
removedStopwordsList = " ".join(cleaned_text)
removedStopwordsList
现在将句子重新组合在一起,但是我希望将其保留在列表中。所需的输出是这样的:
["Get help display", "Display not working properly", "I need some help"]
我想让removedStopwordsList
仍然是我可以循环浏览的列表
removedStopwordsList[0]
给我
"G D I"
现在,但我想要removedStopwordsList[0]
输出
"Get help display"
join函数可以阻止这种情况的发生,但是我找不到更好的解决方法。
答案 0 :(得分:1)
我想删除StopwordsList仍然是列表
然后仅将其作为列表而不是将其作为字符串:
removedStopwordsList = list(cleaned_text)
尽管您可以通过使用列表理解而不是调用filter
来更简单地做到这一点:
removedStopwordsList = [x for x in sentence if x not in stop_words]
map
和filter
很棒,当您有一个要在每个元素上调用的函数时,但是当您有一个任意表达式时,必须将其包装在lambda
中变成函数调用,只使用列表推导或生成器表达式就更简单易读。
您可以类似地简化上一行。所以:
for i in libOfSentences:
sentence = word_tokenize(j) #tokenize each individual word
sentence = (x for x in sentence if x not in string.punctuation)
removedStopwordsList = [x for x in sentence if x not in stop_words]
如果还需要连接字符串,那就很好了;您可以使用第二个变量:
removedStopwordsString = " ".join(removedStopwordsList)
如果您真的想要一个可以同时执行两种操作的对象,那么编写这样的类就不会很困难,但是这很丑陋。而且在幕后,它将只拥有一个self.list_of_words和self.joined_string委托给它。那么,有什么意义呢?
无论如何,我怀疑您是否需要保留字符串。如果您想将其打印出来,可以随时join
对其进行打印:
print(" ".join(removedStopwordsList))
…甚至将其扩展为单独的可打印内容:
print(*removeStopwordsList)
如果您尝试将所有这些列表收集到一个大列表中,则必须实际编写代码来做到这一点。显然,如果您在循环中每次都执行removeStopwordsList = <anything>
,则每次都将其替换。如果要保留所有这些列表,则需要append
到更大的列表。例如:
listOfLists = []
for i in libOfSentences:
sentence = word_tokenize(j) #tokenize each individual word
sentence = (x for x in sentence if x not in string.punctuation)
removedStopwordsList = [x for x in sentence if x not in stop_words]
listOfLists.append(removedStopwordsList)
现在,如果您打印出listOfLists
,它将是两个单词列表的列表; listOfLists[0]
将是第一个列表; listOfLists[0][0]
将成为第一个列表的第一个单词;等