Question

我正在将一个巨大的文本文件索引到一个字典中，该字典包含文件中每个单词的行号。我有以下代码：

i = {}                               # The dictionary

with open("infl2.txt", "r") as f:
    for index, line in enumerate(f): # step through each line
        line = line.lower()          # for case insensitive key matching later on
        if index == 21:              # Print part of the dictionary for debug
            print i
        for w in line.split():       # Split line into words and iterate
            i[w] = index             # Add word to dictionary with line number as value


# TESTING
s = 'aa'
index = i[s]
print s + " -> " + str(index)
print len(i)

输出：

{'aa': 1, 'adhs': 12, 'ac': 9, 'ab': 4, 'ad': 11, 'afaik': 17, 'ai': 19, 'afps': 18, 'adrs': 15, 'as': 0, 'abcs': 5, 'aases': 3, 'aids': 20, 'abc': 5, 'abd': 6, 'ads': 11, 'adp': 13, 'aarp': 2, 'abm': 8, 'acth': 10, 'abs': 4, 'abls': 7, 'afp': 18, 'adh': 12, 'abds': 6, 'aec': 16, 'aidses': 20, 'adps': 14, 'adr': 15, 'a': 0, 'aecs': 16, 'adpses': 14, 'acths': 10, 'ais': 19, 'acs': 9, 'ablses': 7, 'aarps': 2, 'afaiks': 17, 'aas': 3, 'abms': 8}
aa -> 112505
252362

正如您所看到的，'aa'应返回值1（在输出的第一行中的转储中显示为20）。但是，它会返回112505，即file length (in lines) - 1。无论我测试什么密钥，它总是返回112505。

我不知道为什么会这样，所以我很感激你的帮助。

Answer 1

嗯，derp。我正在阅读的文件有问题，并且在最后一行包含了自己的副本，不包括换行符。因此，在处理完最后一行之后，所有值都指向同一行。减1来自第一行索引为0的事实。

Python Dictionary总是为任何键返回相同的值

1 个答案: