Question

以下程序按预期工作。它被设计为通过抛出一个允许用户选择要读取的文本文件的对话框来简单地读取多个文本文件，然后它将在一个名为＆＃34; List_Of_Words.txt＆＃34;的新输出文件中输出结果。。

我遇到的问题，我似乎无法弄清楚，当程序读取多个文本文件以读取并附加到输出文件时，我无法弄清楚如何使输出文件也包括总计数它找到的单词。例如，我读了3个文本文件，它给了我在每个文本文件中找到的单词，每个单词旁边是单词出现的次数，但我还需要它告诉我单词的总数从它读取的所有文本文件中找到，位于输出文件的底部。

当我针对3个文本文件运行程序时得到的结果：

文件名：C：/ Python27 /亚伯拉罕林肯 - 解放宣言（1863年1月1日）.txt

proclamation:   7
constitution:   1
people:   6
authority:   2
strong:   1
freedom:   3
rebellion:   7
mankind:   1
emancipation:   2
slaves:   3

文件名：C：/ Python27 / Andrew Jackson - 第二次就职演说（1833年3月4日）.txt

leaders:   1
constitution:   2
liberty:   4
mankind:   1
society:   1
countrymen:   1
wisdom:   1
responsibility:   1
federal:   2
impoverished:   1
country:   3
happiness:   2
community:   1
world:   2
people:   9
citizens:   3
blessings:   1
contribute:   1
republic:   2

文件名：C：/Python27/Gettysburg.txt

liberty:   1
nation:   5
world:   1
brave:   1
people:   4
freedom:   1

我最终要找的是＆＃34;根据每个单词找到的所有单词的总数：＆＃34; +所有文件中单词+单词的频率

以下是该计划的代码：

from sys import argv
import sys
from string import punctuation
from collections import *
import Tkinter, tkFileDialog

keyWords = ['God', 'Nation', 'nation', 'USA', 'Creater', 'creater', 'Country', 'Almighty',
             'country', 'People', 'people', 'Liberty', 'liberty', 'America', 'Independence', 
             'honor', 'brave', 'Freedom', 'freedom', 'Courage', 'courage', 'Proclamation',
             'proclamation', 'United States', 'Emancipation', 'emancipation', 'Constitution',
             'constitution', 'Government', 'Citizens', 'citizens', 'love', 'Love', 'Strong', 
             'strong', 'Happiness', 'happiness', 'Dignity', 'dignity', 'Motivation', 'motivation',
             'Strength', 'strgenth', 'authority', 'rebellion', 'slave', 'slaves', 'contribute',
             'countrymen', 'leader', 'leaders', 'impoverished', 'community', 'society', 'republic',
             'democrat', 'democracy', 'wisdom', 'world', 'mankind', 'responsibility', 'blessing',
             'blessings', 'federal']

fileDict = {}

print "Do you know the location of the file(s)?"
answer = raw_input("> ")

if answer.lower()  == "yes":
    file_path = tkFileDialog.askopenfilename()
elif answer.lower() == "no":
    print "\nPlease locate the file first before running program\n"
    print "Program will now close"
    sys.exit()

if file_path:
    print "Text file to import and read:", file_path
    print "\nReading file..."

    word_freq = {}  
    text_file = open(file_path, 'r')
    all_lines = text_file.readlines()
    text_file.close()

    print "\nFile read finished!\n"

    for line in all_lines:
        for word in line.split():
            word = word.strip(punctuation).lower()
            if word in word_freq:
                word_freq[word] += 1;
            else:
                word_freq[word] = 1;

    fileDict[file_path] = word_freq


print "Writing sum of results to: List_Of_Words.txt"

output_file = open("List_Of_Words.txt", "a")


for fileName in fileDict:
    output_file.write("\nDocument Name: %s\n\n" % (fileName))
    for word in fileDict[fileName]:
        if word in keyWords:
            output_file.write( "%s: %3d\n" % (word, word_freq[word]) )

output_file.close()

Answer 1

保留一个额外的词典，将单词映射到计数，每次遇到单词时，都要更新单词的计数。

像

这样的东西

wordCounts = {}
<some code>
wordCounts[wordEncountered] = wordCounts.get(wordEncountered,default=0) + 1

这样的事情？

wordCounts = {}
word = wordEncountered
wordCounts[wordEncountered] = wordCounts.get(wordEncountered,default=0) + 1
print wordCounts

运行总计 - Python

1 个答案: