应用错误收集

Python nltk在结果中给了我句子的多个实例

时间：2014-06-19 14:48:33

标签： python nltk

这是我的代码

>>> from nltk.corpus import PlaintextCorpusReader
>>> corpus_root = 'C:/Python27/'
>>> wordlists = PlaintextCorpusReader(corpus_root,'amazonshoes.txt')
>>> sentences = wordlists.sents('amazonshoes.txt')
>>> words_to_find = 'amazon service'.split()
>>> for sentence in sentences:
...     if all(word in sentence for word in words_to_find):
...         print sentence

，结果是

['review', '/', 'text', ':', 'It', "'", 's', 'the', 'first', 'time', 'that', 'I', 'buy', 'something', 'over', 'amazon', 'and', 'I', 'have', 'to', 'say', 'I', 'am', 'very', 'impressed', 'with', 'the', 'service', 'and', 'the', 'quality', 'of', 'the', 'product', '.']
['review', '/', 'text', ':', 'It', "'", 's', 'the', 'first', 'time', 'that', 'I', 'buy', 'something', 'over', 'amazon', 'and', 'I', 'have', 'to', 'say', 'I', 'am', 'very', 'impressed', 'with', 'the', 'service', 'and', 'the', 'quality', 'of', 'the', 'product', '.']
['review', '/', 'text', ':', 'It', "'", 's', 'the', 'first', 'time', 'that', 'I', 'buy', 'something', 'over', 'amazon', 'and', 'I', 'have', 'to', 'say', 'I', 'am', 'very', 'impressed', 'with', 'the', 'service', 'and', 'the', 'quality', 'of', 'the', 'product', '.']
['review', '/', 'text', ':', 'It', "'", 's', 'the', 'first', 'time', 'that', 'I', 'buy', 'something', 'over', 'amazon', 'and', 'I', 'have', 'to', 'say', 'I', 'am', 'very', 'impressed', 'with', 'the', 'service', 'and', 'the', 'quality', 'of', 'the', 'product', '.']
['review', '/', 'text', ':', 'It', "'", 's', 'the', 'first', 'time', 'that', 'I', 'buy', 'something', 'over', 'amazon', 'and', 'I', 'have', 'to', 'say', 'I', 'am', 'very', 'impressed', 'with', 'the', 'service', 'and', 'the', 'quality', 'of', 'the', 'product', '.']

我应该在代码中更改什么。

1 个答案:

答案 0 :(得分：0)

检查您的sentences列表是否已包含重复项。或者改变你的代码：

>>> for sentence in set([tuple(s) for s in sentences]):
...     if all(word in sentence for word in words_to_find):
...         print sentence

此更改确保您拥有一组唯一的句子（更改会将您的列表转换为元组，因为列表不可清除）。