以下是错误消息:
RuntimeError:字典在迭代期间改变了大小
这是我的代码段(< =标记错误行):
# Probability Distribution from a sequence of tuple tokens
def probdist_from_tokens (tokens, N, V = 0, addone = False):
cfd = ConditionalFreqDist (tokens)
pdist = {}
for a in cfd: # <= line with the error
pdist[a] = {}
S = 1 + sum (1 for b in cfd[a] if cfd[a][b] == 1)
A = sum (cfd[a][b] for b in cfd[a])
# Add the log probs.
for b in cfd[a]:
B = sum (cfd[b][c] for c in cfd[b])
boff = ((B + 1) / (N + V)) if addone else (B / N)
pdist[a][b] = math.log ((cfd[a][b] + (S * boff)) / (A + S))
# Add OOV for tag if relevant
if addone:
boff = 1 / (N + V)
pdist[a]["<OOV>"] = math.log ((S * boff) / (A + S))
return pdist
我基本上只是使用cfd作为参考,将正确的值放在pdist中。我不是想改变cfd,我只想迭代它的键和它的子字典的键。
我认为问题是由我设置变量A和B的行引起的,当我在这些行上有不同的代码时我得到了相同的错误但是当我用常量值替换它时我没有得到错误
答案 0 :(得分:1)
nltk.probability.ConditionalFreqDist
继承defaultdict
,这意味着如果您读取不存在的条目cfd[b]
,则会在字典中插入新条目(b, FreqDist())
,从而更改其尺寸。示范问题:
import collections
d = collections.defaultdict(int, {'a': 1})
for k in d:
print(d['b'])
输出:
0
Traceback (most recent call last):
File "1.py", line 4, in <module>
for k in d:
RuntimeError: dictionary changed size during iteration
所以你应该检查这一行:
for b in cfd[a]:
B = sum (cfd[b][c] for c in cfd[b])
您确定b
中确实存在cfd
密钥吗?您可能想将其更改为
B = sum(cfd[b].values()) if b in cfd else 0
# ^~~~~~~~~~~