from __future__ import division
import nltk
from nltk.corpus import wordnet as wn
tokenizer = nltk.data.load('tokenizers/punkt/english.pickle')
fp = open("inpsyn.txt")
data = fp.read()
#to tokenize input text into sentences
print '\n-----\n'.join(tokenizer.tokenize(data))# splits text into sentences
#to tokenize the tokenized sentences into words
tokens = nltk.wordpunct_tokenize(data)
text = nltk.Text(tokens)
words = [w.lower() for w in text]
print words #to print the tokens
for a in words:
print a
syns = wn.synsets(a)
print "synsets:", syns
for s in syns:
for l in s.lemmas:
print l.name
print s.definition
print s.examples
我找不到与我的问题相关的任何代码。如果有任何请给我发链接的链接。 这是一个代码,它不会从给定的文本文件或相关句子中找到同义词
答案 0 :(得分:0)
它就像编码错误一样简单 - 看看你在循环中定义a
的地方(for a in words
)。现在进一步查看您尝试syns=wn.synsets(a)
的位置。在这种情况下,a
未在循环外定义。您想要的是在for a in words
循环中包含所有同义词代码。这就是你想要的:
...
words = [w.lower() for w in nltk.wordpunct_tokenize(data)] # other lines in your code are just excessive
for a in words:
syns = wn.synsets(a)
print "synsets:", syns
for s in syns:
for l in s.lemmas:
print l.name
print s.definition
print s.examples
这是一个有点愚蠢的错误。另外,请学习一些更清晰的编码 - 目前看起来非常邋and和痛苦。