我有这样的代码:
corpus_file = codecs.open("corpus_en-tr.txt", encoding="utf-8").readlines()
corpus = []
for a in range(0, len(corpus_file), 2):
corpus.append({'src': corpus_file[a].rstrip(), 'tgt': corpus_file[a+1].rstrip()})
params = {}
for sentencePair in corpus:
for tgtWord in sentencePair['tgt']:
for srcWord in sentencePair['src']:
params[srcWord][tgtWord] = 1.0
基本上我正在尝试创建一个float字典的字典。但是我收到以下错误:
Traceback (most recent call last):
File "initial_guess.py", line 15, in <module>
params[srcWord][tgtWord] = 1.0
KeyError: u'A'
UTF-8 string as key in dictionary causes KeyError
我查看了上面的案例,但它没有帮助。
基本上我不明白为什么单字符串&#39; A&#39; python中不允许成为键值?有什么办法可以解决吗?
答案 0 :(得分:1)
你的params
字典是空的。
您可以使用树:
from collections import defaultdict
def tree():
return defaultdict(tree)
params = tree()
params['any']['keys']['you']['want'] = 1.0
或者没有tree
的简单defaultdict
案例:
from collections import defaultdict
params = defaultdict(dict)
for sentencePair in corpus:
for tgtWord in sentencePair['tgt']:
for srcWord in sentencePair['src']:
params[srcWord][tgtWord] = 1.0
如果您不想添加类似的内容,那么只需尝试在每次迭代时将dict添加到params
:
params = {}
for sentencePair in corpus:
for srcWord in sentencePair['src']:
params.setdefault(srcWord, {})
for tgtWord in sentencePair['tgt']:
params[srcWord][tgtWord] = 1.0
请注意,我已更改for
循环的顺序,因为您需要先了解srcWord
。
否则你需要经常检查密钥存在:
params = {}
for sentencePair in corpus:
for tgtWord in sentencePair['tgt']:
for srcWord in sentencePair['src']:
params.setdefault(srcWord, {})[tgtWord] = 1.0
答案 1 :(得分:1)
您可以使用setdefault
:
替换
params[srcWord][tgtWord] = 1.0
与
params.setdefault(srcWord, {})[tgtWord] = 1.0