Question

我一直无法让我的程序输出从导入的.txt文件中出现单词的出现次数。对于我的作业，我只能使用字典功能（无计数器），并且必须从文件中删除所有标点符号和大小写。我们使用Project Gutenberg的莎士比亚的“哈姆雷特”作为例子（link）。我已经阅读了其他帖子，希望能够纠正我的情况，但无济于事。这个answer通过inspectorG4dget似乎说明了我理想的程序代码，但是当我运行程序时，会弹出一个KeyError用于所选单词。这是我编辑的程序（仍然收到带有此代码的错误消息）：

def word_dictionary(x):
    wordDict = {}
    filename = open(x, "r").read()
    filename = filename.lower()
    for ch in '"''!@#$%^&*()-_=+,<.>/?;:[{]}~`\|':
        filename = filename.replace(ch, " ")
    for line in filename:
        for word in line.strip().split():
            if word not in wordDict:
                wordDict[word] = wordDict.get(word, 0) + 1
    return wordDict

以下是所需的示例会话：

>>>import shakespeare
>>>words_with_counts = shakespeare.word_dictionary("/Users/username/Desktop/hamlet.txt")
>>>words_with_counts[’the’]
993
>>>words_with_counts[’laugh’]
6

这就是我得到的：

>>> import HOPE
>>> words_with_counts = HOPE.word_dictionary("hamlet.txt")
>>> words_with_counts["the"]
Traceback (most recent call last):
  File "<pyshell#16>", line 1, in <module>
    words_with_counts["the"]
KeyError: 'the'

有人能够检测到我的代码有什么问题吗？非常感谢任何帮助！

Answer 1

您使用错误的密钥键。循环应如下：

for word in filename.strip().split():
    if word not in wordDict:
        wordDict[word] = 0
    wordDict[word] += 1

Answer 2

if word not in wordDict

和

`wordDict[1]` -> `wordDict[word]`

（两次出现）

你为什么算数？

Answer 3

我认为错误是因为

而出现的

for line in filename:

此处＆＃39;文件名＆＃39;是一个字符串，而不是

的文件输入

filename = open(x, "r").read()

被使用了。＆＃39;线＆＃39;拔出每个角色，而不是线。尝试用以下功能替换代码

def word_dictionary(x):
    wordDict = {}
    filename = open(x,"r").read()
    filename = filename.lower()
    for ch in '"''!@#$%^&*()-_=+,<.>/?;:[{]}~`\|':
        filename = filename.replace(ch," ")
    for word in filename.split():
        if word not in wordDict:
            wordDict[word] = 1
        else:
            wordDict[word] = wordDict[word] + 1
    return wordDict

仅使用字典Python 3计算.txt文件中的字频率

3 个答案: