如何在Python列表中将单词反令牌化回原始形式

时间:2018-06-21 21:42:09

标签: python

libOfSentences = ["Get help with the display",
                 "Display is not working properly", "I need some help"]
#removing stopwords

for i in libOfSentences:
     sentence = word_tokenize(j) #tokenize each individual word
     sentence = filter(lambda x: x not in string.punctuation, sentence) 
     cleaned_text = filter(lambda x: x not in stop_words, sentence) 

     removedStopwordsList = " ".join(cleaned_text) 

removedStopwordsList现在将句子重新组合在一起,但是我希望将其保留在列表中。所需的输出是这样的:

["Get help display", "Display not working properly", "I need some help"]

我想让removedStopwordsList仍然是我可以循环浏览的列表

removedStopwordsList[0] 

给我

"G D I" 

现在,但我想要removedStopwordsList[0]

输出

"Get help display"

join函数可以阻止这种情况的发生,但是我找不到更好的解决方法。

1 个答案:

答案 0 :(得分:1)

  

我想删除StopwordsList仍然是列表

然后仅将其作为列表而不是将其作为字符串:

removedStopwordsList = list(cleaned_text)

尽管您可以通过使用列表理解而不是调用filter来更简单地做到这一点:

removedStopwordsList = [x for x in sentence if x not in stop_words]

mapfilter很棒,当您有一个要在每个元素上调用的函数时,但是当您有一个任意表达式时,必须将其包装在lambda中变成函数调用,只使用列表推导或生成器表达式就更简单易读。

您可以类似地简化上一行。所以:

for i in libOfSentences:
    sentence = word_tokenize(j) #tokenize each individual word
    sentence = (x for x in sentence if x not in string.punctuation)
    removedStopwordsList = [x for x in sentence if x not in stop_words]

如果还需要连接字符串,那就很好了;您可以使用第二个变量:

removedStopwordsString = " ".join(removedStopwordsList)

如果您真的想要一个可以同时执行两种操作的对象,那么编写这样的类就不会很困难,但是这很丑陋。而且在幕后,它将只拥有一个self.list_of_words和self.joined_string委托给它。那么,有什么意义呢?

无论如何,我怀疑您是否需要保留字符串。如果您想将其打印出来,可以随时join对其进行打印:

print(" ".join(removedStopwordsList))

…甚至将其扩展为单独的可打印内容:

print(*removeStopwordsList)

如果您尝试将所有这些列表收集到一个大列表中,则必须实际编写代码来做到这一点。显然,如果您在循环中每次都执行removeStopwordsList = <anything>,则每次都将其替换。如果要保留所有这些列表,则需要append到更大的列表。例如:

listOfLists = []
for i in libOfSentences:
    sentence = word_tokenize(j) #tokenize each individual word
    sentence = (x for x in sentence if x not in string.punctuation)
    removedStopwordsList = [x for x in sentence if x not in stop_words]
    listOfLists.append(removedStopwordsList)

现在,如果您打印出listOfLists,它将是两个单词列表的列表; listOfLists[0]将是第一个列表; listOfLists[0][0]将成为第一个列表的第一个单词;等