在文本文件中打印字符百分比

时间:2017-05-01 19:24:20

标签: python

我刚刚编写了一个在文本文件中打印字符百分比的函数。但是,我遇到了问题。我的程序将大写字符计为不同的字符,并计算空格。这就是结果错误的原因。我该如何解决这个问题?

def count_char(text, char):
    count = 0
    for character in text:
        if character == char:
            count += 1
    return count

filename = input("Enter the file name: ")
with open(filename) as file:
    text = file.read()

for char in "abcdefghijklmnopqrstuvwxyz":
    perc = 100 * count_char(text, char) / len(text)
    print("{0} - {1}%".format(char, round(perc, 2)))

4 个答案:

答案 0 :(得分:1)

您应该尝试使用text.lower()使文本小写,然后为了避免计算空格,您应该使用:text.lower().split()将字符串拆分为列表。这应该做:

def count_char(text, char):
    count = 0
    for word in text.lower().split():  # this iterates returning every word in the text
        for character in word:   # this iterates returning every character in each word
            if character == char:
                count += 1
    return count

filename = input("Enter the file name: ")
with open(filename) as file:
    text = file.read()

totalChars = sum([len(i) for i in text.lower().split()]

for char in "abcdefghijklmnopqrstuvwxyz":
    perc = 100 * count_char(text, char) / totalChars
    print("{0} - {1}%".format(char, round(perc, 2)))

请注意perc定义中的更改,sum([len(i) for i in text.lower().split()]会返回单词列表中的字符数,len(text)也会计算空格。

答案 1 :(得分:1)

您可以使用countergenerator expression来计算所有字母:

from collections import Counter 
with open(fn) as f:
    c=Counter(c.lower() for line in f for c in line if c.isalpha())

生成器表达式的说明:

c=Counter(c.lower() for line in f # continued below


    ^                             create a counter
          ^   ^                   each character, make lower case
                         ^        read one line from the file

 # continued
for c in line if c.isalpha())
    ^                            one character from each line of the file
           ^                     iterate over line one character at a time    
                     ^           only add if a a-zA-Z letter                                               

然后得到总字母数:

total_letters=float(sum(c.values()))

然后,任何字母的总百分比为c[letter] / total_letters * 100

请注意,计数器c只有字母 - 而不是空格。因此,每个字母的计算百分比是所有字母的字母的百分比。

这里的优势:

  1. 您正在阅读整个文件,以获取相关字符的总数和所有字符的总数。你也可以在阅读时计算所有角色的频率;
  2. 您无需将整个文件读入内存。这适用于较小的文件,但不适用于较大的文件;
  3. 对于不在文件中的字母,计数器将正确返回0;
  4. Idiomatic Python。
  5. 所以你的整个计划变成了:

    from collections import Counter 
    with open(fn) as f:
        c=Counter(c.lower() for line in f for c in line if c.isalpha())
    total_letters=float(sum(c.values()))
    for char in "abcdefghijklmnopqrstuvwxyz":
        print("{} - {:.2%}".format(char, c[char] / total_letters))
    

答案 2 :(得分:0)

您希望在计算char之前使文本小写:

def count_char(text, char):
    count = 0
    for character in text.lower():
        if character == char:
            count += 1
    return count

答案 3 :(得分:0)

您可以使用内置的.count函数在通过.lower将所有内容转换为小写后对字符进行计数。此外,您当前的程序无法正常工作,因为它在调用len函数时不会排除空格和标点符号。

import string
filename = input("Enter the file name: ")
with open(filename) as file:
    text = file.read().lower()

chars = {char:text.count(char) for char in string.ascii_lowercase}
allLetters = float(sum(chars.values()))
for char in chars:
    print("{} - {}%".format(char, round(chars[char]/allLetters*100, 2)))