读取文本文件以按降序打印字母频率 - Python 3

时间:2018-02-22 01:24:47

标签: python python-3.x list dictionary tuples

我正在做python基本挑战,这是其中之一。我需要做的就是通读文件并按降序打印字母的频率。我能够做到这一点,但我想通过打印频率百分比以及字母 - 频率 - 频率%来增强程序。这样的事情:o - 46 - 10.15%

这是我到目前为止所做的:

def exercise11():
    import string
    while True:
        try:
            fname = input('Enter the file name -> ')
            fop = open(fname)
            break
        except:
            print('This file does not exists. Please try again!')
            continue

    counts = {}
    for line in fop:
        line = line.translate(str.maketrans('', '', string.punctuation))
        line = line.translate(str.maketrans('', '', string.whitespace))
        line = line.translate(str.maketrans('', '', string.digits))
        line = line.lower()
        for ltr in line:
            if ltr in counts:
                counts[ltr] += 1
            else:
                counts[ltr] = 1
    lst = []
    countlst = []
    freqlst = []
    for ltrs, c in counts.items():
        lst.append((c, ltrs))
        countlst.append(c)
    totalcount = sum(countlst)
    for ec in countlst:
        efreq = (ec/totalcount) * 100
        freqlst.append(efreq)
    freqlst.sort(reverse=True)
    lst.sort(reverse=True)
    for ltrs, c, in lst:
        print(c, '-', ltrs)

exercise11()

正如您所看到的,我能够在另一个列表中计算和排序freq%,但我无法将其包含在lst[]列表的元组中以及字母freq。有什么方法可以解决这个问题吗?

此外,如果您对我的代码有任何其他建议。请提一下。 Output Screen

修改

应用@wwii提到的简单修改我得到了所需的输出。我所要做的就是在迭代lst[]列表时再向print语句添加一个参数。以前我试图为freq%制作另一个列表,排序然后尝试将它插入到列表中的字母计数元组中,这些元素没有用完。

 for ltrs, c, in lst:
        print(c, '-', ltrs, '-', round(ltrs/totalcount*100, 2), '%')

Output Screen

5 个答案:

答案 0 :(得分:1)

freqlstcountlistlst中的项目按其位置相互关联。如果有任何分类,那么关系就会丢失。

在排序之前将列表压缩将保持关系。

将从列表初始化行中选择。

lst = []
countlst = []
freqlst = []
for ltr, c in counts.items():
    #change here, lst now only contains letters
    lst.append(ltr)
    countlst.append(c)
totalcount = sum(countlst)
for ec in countlst:
    efreq = (ec/totalcount) * 100
    freqlst.append(efreq)

#New stuff here: Note this only works in python 3+
zipped = zip(lst, countlst, freqlst)
zipped = sorted(zipped, key=lambda x: x[1])

for ltr, c, freq in zipped:
    print("{} - {} - {}%".format(ltr, c, freq)) # love me the format method :)

基本上,zip将列表组合成一个元组列表。然后你可以使用lambda函数来排序那些元组(非常常见的堆栈问题)

答案 1 :(得分:1)

元组是不可变的,这可能是你找到的问题。另一个问题是sort函数的简单形式;更先进的sort功能可以很好地为您服务。见下文:

lst的元组列表格式,但由于元组是不可变的而列表是可变的,因此选择将lst更改为列表列表是一种有效的方法。然后,由于lst是列表列表,每个元素由'letter,count,frequency%'组成,因此可以使用lambda的排序函数按您喜欢的任何索引进行排序。在for line in fop:循环后插入以下内容。

lst = []
for ltrs, c in counts.items():
    lst.append([ltrs,c])
totalcount = sum([x[1] for x in lst])       # sum all 'count' values in a list comprehension

for elem in lst:
    elem.append((elem[1]/totalcount)*100)   # now that each element in 'lst' is a mutable list, you can append the calculated frequency to the respective element in lst

lst.sort(reverse=True,key=lambda lst:lst[2])    # sort in-place in reverse order by index 2.

答案 2 :(得分:1)

您的计数数据位于{letter:count}对的字典中。

您可以使用字典计算总计数,如下所示:

total_count = sum(counts.values())

然后在迭代计数之前不要计算百分比...

for letter, count in counts.items():
    print(f'{letter} - {count} - {100*count/total}')    #Python v3.6+
    #print('{} - {} - {}'.format(letter, count, 100*count/total)    #Python version <3.6+

或者,如果您想将其全部放入列表中,以便对其进行排序:

data = []
for letter, count in counts.items():
    data.append((letter,count,100*count/total)

使用operator.itemgetter作为排序键功能可以帮助编写可读性。

import operator
letter = operator.itemgetter(0)
count = operator.itemgetter(1)
frequency = operator.itemgetter(2)

data.sort(key=letter)
data.sort(key=count)
data.sort(key=frequency)

答案 3 :(得分:0)

我认为通过使用列表而不是元组,我能够实现您想要的效果。无法修改元组,但如果您真的想知道如何点击here

(我还添加了退出程序的可能性)

重要提示:永远不要忘记评论您的代码

代码:

def exercise11():
    import string
    while True:
        try:

            fname = input('Enter the file name -> ')
            print('Press 0 to quit the program') # give the User the option to quit the program easily
            if fname == '0':
                break
            fop = open(fname)
            break
        except:
            print('This file does not exists. Please try again!')
            continue

    counts = {}
    for line in fop:
        line = line.translate(str.maketrans('', '', string.punctuation))
        line = line.translate(str.maketrans('', '', string.whitespace))
        line = line.translate(str.maketrans('', '', string.digits))
        line = line.lower()
        for ltr in line:
            if ltr in counts:
                counts[ltr] += 1
            else:
                counts[ltr] = 1
    lst = []
    countlst = []
    freqlst = []

    for ltrs, c in counts.items():
        # add a zero as a place holder & 
        # use square brakets so you can use a list that you can modify 
        lst.append([c, ltrs, 0]) 
        countlst.append(c)
    totalcount = sum(countlst)

    for ec in countlst:
        efreq = (ec/totalcount) * 100
        freqlst.append(efreq)
    freqlst.sort(reverse=True)
    lst.sort(reverse=True)

    # count the total of the letters 
    counter = 0
    for ltrs in lst:
        counter += ltrs[0]

    # calculate the percentage for each letter 
    for letter in lst:
        percentage = (letter[0] / counter) * 100
        letter[2] += float(format(percentage, '.2f'))

    for i in lst:
        print('The letter {} is repeated {} times, which is {}% '.format(i[1], i[0], i[2]))
exercise11()

答案 4 :(得分:0)

<?php

$fh = fopen("text.txt", 'r') or    die("File does not exist");
 $line = fgets($fh); 

 $words = count_chars($line, 1); 

foreach ($words as $key=>$value)
   {
   echo "The character  <b>' ".chr($key)." '</b>  was found   <b>$value</b>   times. <br>";
   }

?>