Question

我的程序打开一个文件，它可以对其中包含的单词进行字数统计，但我想创建一个由文本中所有唯一单词组成的字典例如，如果“计算机”一词出现三次，我希望将其视为一个独特的单词

def main():

    file = input('Enter the name of the input file: ')
    infile = open(file, 'r')

    file_contents = infile.read()

    infile.close()

    words = file_contents.split()

    number_of_words = len(words)

    print("There are", number_of_words, "words contained in this paragarph")

main()

Answer 1

使用一套。这只会包含唯一的字词：

words = set(words)

如果你不关心案件，你可以这样做：

words = set(word.lower() for word in words)

这假设没有标点符号。如果有，你需要删除标点符号。

import string
words = set(word.lower().strip(string.punctuation) for word in words)

如果您需要跟踪每个单词的数量，请在上面的示例中将set替换为Counter：

import string
from collections import Counter
words = Counter(word.lower().strip(string.punctuation) for word in words)

这将为您提供一个类似字典的对象，告诉您每个单词有多少。

您还可以从中获取唯一单词的数量（如果您关心的话，它会更慢）：

import string
from collections import Counter
words = Counter(word.lower().strip(string.punctuation) for word in words)
nword = len(words)

Answer 2

@TheBlackCat他的解决方案有效但只能给你字符串/文件中有多少独特单词。此解决方案还会显示它出现的次数。

dictionaryName = {}
for word in words:
    if word not in list(dictionaryName):
        dictionaryName[word] = 1
    else:
        number = dictionaryName.get(word)
        dictionaryName[word] = dictionaryName.get(word) + 1
print dictionaryName

测试：

words = "Foo", "Bar", "Baz", "Baz"
output: {'Foo': 1, 'Bar': 1, 'Baz': 2}

Answer 3

可能更清洁，更快速的解决方案：

words_dict = {}
for word in words:
    word_count = words_dict.get(word, 0)
    words_dict[word] = word_count + 1

如何为文本文件创建字典

3 个答案: