Question

我正在编写一个遍历葛底斯堡地址并计算每个字母出现次数的代码。然后，将字母作为键存储在字典中，每个键的值是该特定值出现的总数。葛底斯堡演说分为三行，以供我们浏览。我写的方式是让每一行都循环通过，但是我无法添加行的出现次数，以便在字典中得到总数。例如，如果在第1行中有5 As，在第2行中有10 As，在第3行中有15 As，则总数应为30 As，而字典应为a：30。

此外，在文件中，我们在第1行和第2行之间以及在第2行和第3行之间有空白行，并且我不知道如何出于循环目的删除这些行。

最后，现在我已经写出了该程序的每个字母，但是我想知道是否有更简单的方法可以简化我的工作。

# Function: readFile
# Parameters: filename
# Return: dictionary
# Detail: Loop through each line of the Gettysburg Address File and count the occurrences of each letter in each line
# Detail: Sum the occurrences of each letter for each line to find the total occurences of each letter for the entire document
# Add the letter and its occurence to a dictionary key:value = letter:occurence
def readFile(filename = "gettysburg.txt"):
    fileIn = open(filename, "r")
    dictionary = {}
    for line in fileIn:
        line.lower()
        letter = "a"
        aCount = line.count("a")
        dictionary[letter] = aCount
        letter = "b"
        bCount = line.count("b")
        dictionary[letter] = bCount
        letter = "c"
        cCount = line.count("c")
        dictionary[letter] = cCount
        letter = "d"
        dCount = line.count("d")
        dictionary[letter] = dCount
        letter = "e"
        eCount = line.count("e")
        dictionary[letter] = eCount
        letter = "f"
        fCount = line.count("f")
        dictionary[letter] = fCount
        letter = "g"
        gCount = line.count("g")
        dictionary[letter] = gCount
        letter = "h"
        hCount = line.count("h")
        dictionary[letter] = hCount
        letter = "i"
        iCount = line.count("i")
        dictionary[letter] = iCount
        letter = "j"
        jCount = line.count("j")
        dictionary[letter] = jCount
        letter = "k"
        kCount = line.count("k")
        dictionary[letter] = kCount
        letter = "l"
        lCount = line.count("l")
        dictionary[letter] = lCount
        letter = "m"
        mCount = line.count("m")
        dictionary[letter] = mCount
        letter = "n"
        nCount = line.count("n")
        dictionary[letter] = nCount
        letter = "o"
        oCount = line.count("o")
        dictionary[letter] = oCount
        letter = "p"
        pCount = line.count("p")
        dictionary[letter] = pCount
        letter = "q"
        qCount = line.count("q")
        dictionary[letter] = qCount
        letter = "r"
        rCount = line.count("r")
        dictionary[letter] = rCount
        letter= "s"
        sCount = line.count("s")
        dictionary[letter] = sCount
        letter = "t"
        tCount = line.count("t")
        dictionary[letter] = tCount
        letter = "u"
        uCount = line.count("u")
        dictionary[letter] = uCount
        letter = "v"
        vCount = line.count("v")
        dictionary[letter] = vCount
        letter = "w"
        wCount = line.count("w")
        dictionary[letter] = wCount
        letter = "x"
        xCount = line.count("x")
        dictionary[letter] = xCount
        letter = "y"
        yCount = line.count("y")
        dictionary[letter] = yCount
        letter = "z"
        zCount = line.count("z")
        dictionary[letter] = bCount
        print(dictionary)

    fileIn.close()

# function: sortKeys
# parameter: Dictionary
# Return: a list of the keys in alphabetical order
# Use the sort method on a list
def sortKeys(dictionary):
    sortedDictionary = sortKeys(dictionary)
    dictionaryList = [[k,v] for k,v in dictionary.items()]

# function: main
# call the readFile function to create a dictionary and store in it a variable
# call the sortKeys function to get a list of sorted keys and store it in a variabel
# Loop through the sorted keys list to print each letter and its frequency (number of times it occurs) using the dictionary.
def main():
    readFile()
    sortKeys()
    print("Displaying letter frequency of the Gettysburg Address")
    for key, value in dictionaryList:
        print(key, value)

main()

Answer 1

当然：

from string import ascii_lowercase

def readFile(filename = "gettysburg.txt"):
    with open(filename) as f:
        data = f.read().lower()
        letter_counts = {letter: data.count(letter) for letter in ascii_lowercase}

    return letter_counts

首先，更喜欢使用with而不是open，因为如果使用open，则需要记住要关闭文件对象。

其次，您基本上想要的是字典理解：一种自动填充dict的键和值（在某种程度上相关）的方法。

此代码段的作用是通过ascii_lowercase进行迭代，该字符串是包含字母小写字母的字符串。每个字母成为结果dict中的键，相应的值就是给定文本中该字母的计数。

Answer 2

使用string.ascii_lowercase

import string
...

for letter in string.ascii_lowercase:
    dictionary[letter] = line.count(letter)

Answer 3

您可以为此使用Counter

import re
from collections import Counter

Counter(re.findall(r'[a-z]', open('gettysburg.txt').read()))

它像字典一样工作，其中key是出现次数，value是它的数量。检出文档：https://docs.python.org/3.7/library/collections.html#counter-objects

Answer 4

集合是用更少的代码完成此任务的一种方法。

更新后的答案

这是一支可以完成与我原来的答案相同的任务的班轮。

frequency_of_characters = Counter([char for char in open('gettysburg_address.txt').read().lower() if char in string.ascii_letters])

原始答案

这是函数中的代码：

import string
from pprint import pprint
from collections import Counter

def get_characters_frequency(filename):
  with open(filename, 'r') as input:
    readfile = input.read()
    filtered_text = [char.lower() for char in readfile if char in string.ascii_letters]
    frequency_of_characters = Counter(filtered_text)
    return frequency_of_characters

frequency_of_characters = get_characters_frequency('gettysburg_address.txt')
pprint (frequency_of_characters)
# outputs
Counter({'e': 167,
 't': 126,
 'a': 102,
 'o': 93,
 'h': 81,
 'r': 80,
 'n': 77,
 'i': 68,
 'd': 58,
 's': 44,
 'l': 42,
 'c': 31,
 'g': 28,
 'w': 28,
 'f': 27,
 'v': 24,
 'u': 21,
 'p': 15,
 'b': 14,
 'm': 13,
 'y': 10,
 'k': 3,
 'q': 1})

我可以简化我的代码，以便不写出每个字母吗？

4 个答案: