如何使用nltk找出给定文本文件中单词的同义词或相关句子

时间:2014-08-04 11:22:24

标签: compare nltk synonym

from __future__ import division
import nltk
from nltk.corpus import wordnet as wn


tokenizer = nltk.data.load('tokenizers/punkt/english.pickle')
fp = open("inpsyn.txt")
data = fp.read()

#to tokenize input text into sentences

print '\n-----\n'.join(tokenizer.tokenize(data))# splits text into sentences

#to tokenize the tokenized sentences into words

tokens = nltk.wordpunct_tokenize(data)
text = nltk.Text(tokens)
words = [w.lower() for w in text]  
print words     #to print the tokens

for a in words:
    print a

syns = wn.synsets(a)
print "synsets:", syns

for s in syns:
    for l in s.lemmas:
        print l.name
    print s.definition
    print s.examples

我找不到与我的问题相关的任何代码。如果有任何请给我发链接的链接。 这是一个代码,它不会从给定的文本文件或相关句子中找到同义词

1 个答案:

答案 0 :(得分:0)

它就像编码错误一样简单 - 看看你在循环中定义a的地方(for a in words)。现在进一步查看您尝试syns=wn.synsets(a)的位置。在这种情况下,a未在循环外定义。您想要的是在for a in words循环中包含所有同义词代码。这就是你想要的:

...

words = [w.lower() for w in nltk.wordpunct_tokenize(data)]   # other lines in your code are just excessive

for a in words:
    syns = wn.synsets(a)
    print "synsets:", syns

    for s in syns:
         for l in s.lemmas:
             print l.name
         print s.definition
         print s.examples

这是一个有点愚蠢的错误。另外,请学习一些更清晰的编码 - 目前看起来非常邋and和痛苦。