我正在编写一个遍历葛底斯堡地址并计算每个字母出现次数的代码。然后,将字母作为键存储在字典中,每个键的值是该特定值出现的总数。葛底斯堡演说分为三行,以供我们浏览。我写的方式是让每一行都循环通过,但是我无法添加行的出现次数,以便在字典中得到总数。例如,如果在第1行中有5 As,在第2行中有10 As,在第3行中有15 As,则总数应为30 As,而字典应为a:30。
此外,在文件中,我们在第1行和第2行之间以及在第2行和第3行之间有空白行,并且我不知道如何出于循环目的删除这些行。
最后,现在我已经写出了该程序的每个字母,但是我想知道是否有更简单的方法可以简化我的工作。
# Function: readFile
# Parameters: filename
# Return: dictionary
# Detail: Loop through each line of the Gettysburg Address File and count the occurrences of each letter in each line
# Detail: Sum the occurrences of each letter for each line to find the total occurences of each letter for the entire document
# Add the letter and its occurence to a dictionary key:value = letter:occurence
def readFile(filename = "gettysburg.txt"):
fileIn = open(filename, "r")
dictionary = {}
for line in fileIn:
line.lower()
letter = "a"
aCount = line.count("a")
dictionary[letter] = aCount
letter = "b"
bCount = line.count("b")
dictionary[letter] = bCount
letter = "c"
cCount = line.count("c")
dictionary[letter] = cCount
letter = "d"
dCount = line.count("d")
dictionary[letter] = dCount
letter = "e"
eCount = line.count("e")
dictionary[letter] = eCount
letter = "f"
fCount = line.count("f")
dictionary[letter] = fCount
letter = "g"
gCount = line.count("g")
dictionary[letter] = gCount
letter = "h"
hCount = line.count("h")
dictionary[letter] = hCount
letter = "i"
iCount = line.count("i")
dictionary[letter] = iCount
letter = "j"
jCount = line.count("j")
dictionary[letter] = jCount
letter = "k"
kCount = line.count("k")
dictionary[letter] = kCount
letter = "l"
lCount = line.count("l")
dictionary[letter] = lCount
letter = "m"
mCount = line.count("m")
dictionary[letter] = mCount
letter = "n"
nCount = line.count("n")
dictionary[letter] = nCount
letter = "o"
oCount = line.count("o")
dictionary[letter] = oCount
letter = "p"
pCount = line.count("p")
dictionary[letter] = pCount
letter = "q"
qCount = line.count("q")
dictionary[letter] = qCount
letter = "r"
rCount = line.count("r")
dictionary[letter] = rCount
letter= "s"
sCount = line.count("s")
dictionary[letter] = sCount
letter = "t"
tCount = line.count("t")
dictionary[letter] = tCount
letter = "u"
uCount = line.count("u")
dictionary[letter] = uCount
letter = "v"
vCount = line.count("v")
dictionary[letter] = vCount
letter = "w"
wCount = line.count("w")
dictionary[letter] = wCount
letter = "x"
xCount = line.count("x")
dictionary[letter] = xCount
letter = "y"
yCount = line.count("y")
dictionary[letter] = yCount
letter = "z"
zCount = line.count("z")
dictionary[letter] = bCount
print(dictionary)
fileIn.close()
# function: sortKeys
# parameter: Dictionary
# Return: a list of the keys in alphabetical order
# Use the sort method on a list
def sortKeys(dictionary):
sortedDictionary = sortKeys(dictionary)
dictionaryList = [[k,v] for k,v in dictionary.items()]
# function: main
# call the readFile function to create a dictionary and store in it a variable
# call the sortKeys function to get a list of sorted keys and store it in a variabel
# Loop through the sorted keys list to print each letter and its frequency (number of times it occurs) using the dictionary.
def main():
readFile()
sortKeys()
print("Displaying letter frequency of the Gettysburg Address")
for key, value in dictionaryList:
print(key, value)
main()
答案 0 :(得分:3)
当然:
from string import ascii_lowercase
def readFile(filename = "gettysburg.txt"):
with open(filename) as f:
data = f.read().lower()
letter_counts = {letter: data.count(letter) for letter in ascii_lowercase}
return letter_counts
首先,更喜欢使用with
而不是open
,因为如果使用open
,则需要记住要关闭文件对象。
其次,您基本上想要的是字典理解:一种自动填充dict
的键和值(在某种程度上相关)的方法。
此代码段的作用是通过ascii_lowercase
进行迭代,该字符串是包含字母小写字母的字符串。每个字母成为结果dict
中的键,相应的值就是给定文本中该字母的计数。
答案 1 :(得分:1)
使用string.ascii_lowercase
import string
...
for letter in string.ascii_lowercase:
dictionary[letter] = line.count(letter)
答案 2 :(得分:0)
您可以为此使用Counter
import re
from collections import Counter
Counter(re.findall(r'[a-z]', open('gettysburg.txt').read()))
它像字典一样工作,其中key是出现次数,value是它的数量。检出文档:https://docs.python.org/3.7/library/collections.html#counter-objects
答案 3 :(得分:0)
集合是用更少的代码完成此任务的一种方法。
更新后的答案
这是一支可以完成与我原来的答案相同的任务的班轮。
frequency_of_characters = Counter([char for char in open('gettysburg_address.txt').read().lower() if char in string.ascii_letters])
原始答案
这是函数中的代码:
import string
from pprint import pprint
from collections import Counter
def get_characters_frequency(filename):
with open(filename, 'r') as input:
readfile = input.read()
filtered_text = [char.lower() for char in readfile if char in string.ascii_letters]
frequency_of_characters = Counter(filtered_text)
return frequency_of_characters
frequency_of_characters = get_characters_frequency('gettysburg_address.txt')
pprint (frequency_of_characters)
# outputs
Counter({'e': 167,
't': 126,
'a': 102,
'o': 93,
'h': 81,
'r': 80,
'n': 77,
'i': 68,
'd': 58,
's': 44,
'l': 42,
'c': 31,
'g': 28,
'w': 28,
'f': 27,
'v': 24,
'u': 21,
'p': 15,
'b': 14,
'm': 13,
'y': 10,
'k': 3,
'q': 1})