这部分代码工作正常并且构建字典很好。
#!/usr/bin/env python
#-*- coding: utf-8 -*-
import collections
from operator import itemgetter
S_eng = "Hindi constitutional form in India first"
S_hindi = "हिन्दी संवैधानिक रूप से भारत की प्रथम "
word_count = collections.defaultdict( dict )
for st in S_eng .split(" "):
for st_1 in S_hindi.split(" "):
print type(st), type(st_1)
word_count[st][st_1] = 1
print word_count
但是当我尝试阅读一个包含英语和印地语句子并试图创建字典的文件时,会发生以下情况
#!/usr/bin/env python
#-*- coding: utf-8 -*-
P = defaultdict(dict)
i = "your"
j = "अपने"
if(P[i][j] >= 0):
P[i][j] += 1
else:
P[i][j] = 0
print P
这会产生错误:
Traceback (most recent call last):
File "lerxical_probab.py", line 31, in <module>
if(P[i][j] >= 0):
KeyError: '\xe0\xa4\x85\xe0\xa4\xaa\xe0\xa4\xa8\xe0\xa5\x87'
我检查了i and j
的类型,两者都只是'str'。
有人可以帮忙解决这个问题吗?
And how come one works and other don't?