我正在使用停用词进行python中的自动语言检测
但是我在尝试测试代码时遇到了KeyError。 这是代码
import nltk
from nltk.corpus import stopwords
def scoreFunction(wholetext):
dictiolist={}
scorelist={}
NLTKlanguage = ["dutch","finnish","german","italian","portuguese","spanish","turkish","danish","english"," french","hungarian","norwegian","russian","swedish"]
FREElanguages = [""]
languages= NLTKlanguages + FREElanguages
for lang in NLTKlanguages:
dictiolist[lang]=stopwords.words(lang)
tokens=nltk.tokenize.word_tokenize(wholetext)
tokens=[t.lower() for t in tokens]
freq_dist=nltk.FreqDist(tokens)
for lang in languages:
scorelist[lang]=0
for word in freq_dist.keys()[0:20]:
if word in dictiolist[lang]:
scorelist[lang]+=1
return scorelist
def whichLanguage(scorelist):
maximum=0
for item in scorelist:
value = scorelist[item]
if maximum<value:
maximum = value
lang = item
return lang
我运行它得分函数(“我的名字叫奥斯法,我是天才”) 我收到了错误 回溯(最近一次调用最后一次):文件“”,第1行,
scoreFunction("hello my name is osfar and i'm very genius")
File "C:/Users/osama1/Desktop
/fun-test", line 17, in scoreFunction
if word in dictiolist[lang]:
KeyError: ''
答案 0 :(得分:1)
您的问题出现在以下代码块中:
for word in freq_dist.keys()[0:20]:
if word in dictiolist[lang]:
scorelist[lang]+=1
你在这个for循环中使用变量lang
,但你没有在任何地方定义它。这意味着它的值是未定义的;当它发生时,它的值是“”(空字符串),因为那是它在你之前的for循环中的最后一个值。
你显然要做的是:
for word in freq_dist.keys()[0:20]:
for lang in languages:
if word in dictiolist[lang]:
scorelist[lang]+=1
顺便说一下,有一种更简单的方法可以做你想要做的事情:使用计数器。有关详细信息,请参阅http://docs.python.org/2.7/library/collections.html#counter-objects。