Python KeyError:''用于自动语言检测

时间:2013-04-24 08:35:02

标签: python

我正在使用停用词进行python中的自动语言检测

但是我在尝试测试代码时遇到了KeyError。 这是代码

import nltk
from nltk.corpus import stopwords

def scoreFunction(wholetext):
    dictiolist={}
    scorelist={}
    NLTKlanguage = ["dutch","finnish","german","italian","portuguese","spanish","turkish","danish","english"," french","hungarian","norwegian","russian","swedish"]
    FREElanguages = [""]
    languages= NLTKlanguages + FREElanguages
    for lang in NLTKlanguages:
        dictiolist[lang]=stopwords.words(lang)
        tokens=nltk.tokenize.word_tokenize(wholetext)
        tokens=[t.lower() for t in tokens]
        freq_dist=nltk.FreqDist(tokens)
    for lang in languages:
        scorelist[lang]=0
    for word in freq_dist.keys()[0:20]:
        if word in dictiolist[lang]:
            scorelist[lang]+=1
    return scorelist

def whichLanguage(scorelist):
    maximum=0
    for item in scorelist:
        value = scorelist[item]
        if maximum<value:
            maximum = value
            lang = item
    return lang

我运行它得分函数(“我的名字叫奥斯法,我是天才”) 我收到了错误     回溯(最近一次调用最后一次):文件“”,第1行,

scoreFunction("hello my name is osfar and i'm very genius") 
File "C:/Users/osama1/Desktop
/fun-test", line 17, in scoreFunction 
if word in dictiolist[lang]:
KeyError: ''

1 个答案:

答案 0 :(得分:1)

您的问题出现在以下代码块中:

for word in freq_dist.keys()[0:20]:
    if word in dictiolist[lang]:
    scorelist[lang]+=1

你在这个for循环中使用变量lang,但你没有在任何地方定义它。这意味着它的值是未定义的;当它发生时,它的值是“”(空字符串),因为那是它在你之前的for循环中的最后一个值。

你显然要做的是:

for word in freq_dist.keys()[0:20]:
    for lang in languages:
        if word in dictiolist[lang]:
        scorelist[lang]+=1

顺便说一下,有一种更简单的方法可以做你想要做的事情:使用计数器。有关详细信息,请参阅http://docs.python.org/2.7/library/collections.html#counter-objects