我正在尝试使用python与NTLK来获得许多单词的缩略语(现在为2)。似乎我可以让它与第一个单词一起使用,但不是第二个单词。我猜我还有很多东西要学习NTLK。下面有一些简化的示例代码。我基本上试图获得两个缩略语列表,每个单词列表1个。第一个for循环一切顺利。在我尝试了第二个字后,我得到了:
syn2 = wn.synsets(word)[0].lemmas[y]
IndexError: list index out of range
希望有人能帮助我理解为什么会这样。
import nltk
from nltk.corpus import wordnet as wn
import string
from array import *
syn1 = ''
syn2 = ''
mylist = []
mylist2 = []
mylist3 = []
Web_Keywd = 'car loan'
wuser_words = Web_Keywd.split()
for word in wuser_words:
i=i+1
#first
if (i == 1) :
synset1 = wn.synsets(word)
y = 0
for synset in synset1:
syn1 = wn.synsets(word)[0].lemmas[y]
syn1 = syn1.name
mylist2.append(syn1)
y=y+1
if (i == 2) :
y = 0
for synset2 in wn.synsets(word):
syn2 = wn.synsets(word)[0].lemmas[y]
syn2 = syn2.name
mylist3.append(syn2)
y=y+1
答案 0 :(得分:1)
我可能在使用wn.synsets(word)[0].lemmas[y]
之前的回答中误导了你。你需要明确地循环遍历lemmas,因为你不知道有多少提前。用例示例:
Web_Keywd = 'car loan cheap'
results = {}
for word in Web_Keywd.split():
for synset in wn.synsets(word):
for lemma in synset.lemmas:
results.setdefault(word, []).append(lemma.name)
results
现在看起来如下:
{'car': ['car', 'auto', 'automobile', 'machine'...],
'loan': ['loan', 'loanword', 'loan', 'lend', 'loan'...],
'cheap': ['cheap', 'inexpensive', 'brassy', 'cheap...]}
为每个提交的字词获取唯一的结果,与其他字词无关:
.... # same as above
results.setdefault(word, set()).add(lemma.name)
获取所有提交单词的唯一字词列表:
Web_Keywd = 'car loan cheap'
words = set(Web_Keywd.split())
results = set(
lemma.name
for word in words
for synset in wn.synsets(word)
for lemma in synset.lemmas
)
# results -> {'loanword', 'tatty', 'automobile', 'cheap', 'chinchy',...