Python索引超出范围?

时间:2016-02-13 18:10:03

标签: python python-2.7 nlp arabic wordnet

我正在使用阿拉伯语wordnet来查找同义词;它使用下面的代码工作正常,它输出正确的同义词:

import unicodedata
from nltk.corpus import wordnet as wn
yxz='work'
jan = wn.synsets(yxz)[0]
abc=jan.lemma_names(lang='arb')
for bca in abc: #Converting from unicode to arabic done
    nfkd_form = unicodedata.normalize('NFKD', bca)
    encoded=nfkd_form.encode('utf-8')#this works fine
    encoded= u"".join([c for c in nfkd_form if not unicodedata.combining(c)])
    print encoded

但是我想迭代上面的部分并且我每次都更改单词(yxz)但它不起作用,因为索引超出范围:(我有一个XML文档,我想得到同义词只在某个句子中的动词,动词出现在XML文档中的标记<v>

之间
Synonyms=[]

for phrase in root.findall('./PHRASE'):
  ens = {en.get('x'): en.text for en in phrase.findall('en')}
  if 'ORG' in ens and 'PERS' in ens:
    if (((ens["ORG"] ==u"جامعة بيت لحم" )and (ens["PERS"]==u" ه أحمد")) or ((ens["ORG"] ==u"جامعة كولومبيا." )and (ens["PERS"]==u"رئيس الجمهورية السيد محمد المنصف المرزوقي")) or ((ens["ORG"] ==u"معمل باريكادي " )and (ens["PERS"]==u"رئيس فنزويلا")) or ((ens["ORG"] ==u"شركة جوجل" )and (ens["PERS"]==u"لاري بيدج وسيرغي برين")) or((ens["ORG"] ==u"محترفه الباريسي" )and (ens["PERS"]==u"بول"))):
     for v in phrase.findall('V'):
                    #----------------------------------------ENGLISH SYNONYM TRIAL---------------------------

          print("------ English Synonym Trial----------")
          #-------Step 8.3] Google Translate API working fine ) now want it to translate from ar to en from Diacritics----------
          #-----8.3.1] Translate Diactrics Array words to english-----------------------
          gs = goslate.Goslate()
          engVerb=gs.translate(unicode(v.text), 'en') #english word is the output
          print("---EngVerb---")
          print(engVerb)
          #-----8.3.2] use Arabic Wordnet to get the synonyms[English->output unicode] Working -----------------------
          #yxz='work'
          jan = wn.synsets(engVerb)[0]
          abc=jan.lemma_names(lang='arb')
          for bca in abc: #Converting from unicode to arabic done
             nfkd_form = unicodedata.normalize('NFKD', bca)
             encoded=nfkd_form.encode('utf-8')#this works fine
             encoded= u"".join([c for c in nfkd_form if not unicodedata.combining(c)])
             #print encoded
             Synonyms.append(encoded)
print("----------------------------PRINTING SYNONYMS---------------------------")
print Synonyms

但是我总是得到错误

 jan = wn2.synsets(engVerb)[0] 

IndexError:列表索引超出范围

1 个答案:

答案 0 :(得分:1)

该错误意味着wn2.synsets(engVerb)是一个空列表(使用print进行调试,它有很多帮助),并且您正在尝试访问其不存在的第一个元素。

请改为尝试:

x = wn2.synsets(engVerb)
if len(x) == 0:
    continue
else:
    jan = x[0]