如何为文本文件创建字典

时间:2015-03-27 12:57:42

标签: python dictionary

我的程序打开一个文件,它可以对其中包含的单词进行字数统计,但我想创建一个由文本中所有唯一单词组成的字典 例如,如果“计算机”一词出现三次,我希望将其视为一个独特的单词

def main():

    file = input('Enter the name of the input file: ')
    infile = open(file, 'r')

    file_contents = infile.read()

    infile.close()

    words = file_contents.split()

    number_of_words = len(words)

    print("There are", number_of_words, "words contained in this paragarph")

main()

3 个答案:

答案 0 :(得分:2)

使用一套。这只会包含唯一的字词:

words = set(words)

如果你不关心案件,你可以这样做:

words = set(word.lower() for word in words)

这假设没有标点符号。如果有,你需要删除标点符号。

import string
words = set(word.lower().strip(string.punctuation) for word in words)

如果您需要跟踪每个单词的数量,请在上面的示例中将set替换为Counter

import string
from collections import Counter
words = Counter(word.lower().strip(string.punctuation) for word in words)

这将为您提供一个类似字典的对象,告诉您每个单词有多少。

您还可以从中获取唯一单词的数量(如果您关心的话,它会更慢):

import string
from collections import Counter
words = Counter(word.lower().strip(string.punctuation) for word in words)
nword = len(words)   

答案 1 :(得分:0)

@TheBlackCat他的解决方案有效但只能给你字符串/文件中有多少独特单词。此解决方案还会显示它出现的次数。

dictionaryName = {}
for word in words:
    if word not in list(dictionaryName):
        dictionaryName[word] = 1
    else:
        number = dictionaryName.get(word)
        dictionaryName[word] = dictionaryName.get(word) + 1
print dictionaryName

测试:

words = "Foo", "Bar", "Baz", "Baz"
output: {'Foo': 1, 'Bar': 1, 'Baz': 2}

答案 2 :(得分:0)

可能更清洁,更快速的解决方案:

words_dict = {}
for word in words:
    word_count = words_dict.get(word, 0)
    words_dict[word] = word_count + 1