我正在尝试对此Bigram进行编码,我有这段代码,但它一直给我:
counts[given][char] += 1
IndexError: list index out of range
我不知道如何处理它。任何人都可以帮助我吗?
def pairwise(s):
a,b = itertools.tee(s)
next(b)
return zip(a,b)
counts = [[0 for _ in range(52)] for _ in range(52)]
with open('path/to/open') as file:
for a,b in pairwise(char for line in file for word in line.split() for char in word):
given = ord(a) - ord('a')
char = ord(b) - ord('a')
counts[given][char] += 1
我收到此错误:
Traceback: counts[given][char] += 1 IndexError: list index out of range
答案 0 :(得分:1)
您的counts
变量是pairwise()
函数中的本地。
因此,尝试在counts
循环中访问for
作为全局会引发NameError
。但是你用毯子except
来沉默那个例外。不要这样做。例如,请参阅Why is "except: pass" a bad programming practice?。如果您想忽略索引错误,请明确地捕获 该异常:
except IndexError:
print 'failed'
并让其他例外与您联系,以便您更正错误。
取消counts
行,它不属于pairwise()
函数的一部分:
def pairwise(s):
a,b = itertools.tee(s)
next(b)
return zip(a,b)
counts = [[0 for _ in range(52)] for _ in range(52)]
with open('path/to/open') as file:
for a,b in pairwise(char for line in file for word in line.split() for char in word):
given = ord(a) - ord('a')
char = ord(b) - ord('a')
try:
counts[given][char] += 1
except IndexError:
# unknown character, ignore this one
请注意,对于小写ASCII字母(a-z)以外的任何内容,您将生成过大或过低的索引。 ord('a')
是97,但大写字母的范围从65到90.这意味着你最终会得到从-32到-5的整数。这可能不是你想要的。