python dict中的单编码字符串键错误

时间:2016-10-04 10:09:25

标签: python dictionary unicode

我有这样的代码:

corpus_file = codecs.open("corpus_en-tr.txt", encoding="utf-8").readlines()

corpus = []
for a in range(0, len(corpus_file), 2):
     corpus.append({'src': corpus_file[a].rstrip(), 'tgt': corpus_file[a+1].rstrip()})

params = {}

for sentencePair in corpus:
     for tgtWord in sentencePair['tgt']:
          for srcWord in sentencePair['src']:
               params[srcWord][tgtWord] = 1.0

基本上我正在尝试创建一个float字典的字典。但是我收到以下错误:

Traceback (most recent call last):
  File "initial_guess.py", line 15, in <module>
    params[srcWord][tgtWord] = 1.0
KeyError: u'A'

UTF-8 string as key in dictionary causes KeyError

我查看了上面的案例,但它没有帮助。

基本上我不明白为什么单字符串&#39; A&#39; python中不允许成为键值?有什么办法可以解决吗?

2 个答案:

答案 0 :(得分:1)

你的params字典是空的。

您可以使用树:

from collections import defaultdict

def tree():
    return defaultdict(tree)

params = tree()
params['any']['keys']['you']['want'] = 1.0

或者没有tree的简单defaultdict案例:

from collections import defaultdict

params = defaultdict(dict)

for sentencePair in corpus:
    for tgtWord in sentencePair['tgt']:
        for srcWord in sentencePair['src']:
               params[srcWord][tgtWord] = 1.0

如果您不想添加类似的内容,那么只需尝试在每次迭代时将dict添加到params

params = {}

for sentencePair in corpus:
    for srcWord in sentencePair['src']:
        params.setdefault(srcWord, {})
        for tgtWord in sentencePair['tgt']:  
               params[srcWord][tgtWord] = 1.0

请注意,我已更改for循环的顺序,因为您需要先了解srcWord

否则你需要经常检查密钥存在:

params = {}

for sentencePair in corpus:
    for tgtWord in sentencePair['tgt']:
        for srcWord in sentencePair['src']:
            params.setdefault(srcWord, {})[tgtWord] = 1.0

答案 1 :(得分:1)

您可以使用setdefault

替换

params[srcWord][tgtWord] = 1.0

params.setdefault(srcWord, {})[tgtWord] = 1.0