如何从字符串中对一堆字母及其相应的频率进行排序?

时间:2017-12-08 05:26:18

标签: python string python-3.x sorting ascii

我正在尝试编写一个可以分析字符串并打印字符串中字母及其相应频率的函数。这部分并不是一个挑战,因为我可以使用ascii_lowercase对字符串进行排序,然后使用ascii中可用的count()方法。困难的部分是对频率进行排序,以便最常见的字母是开头,最不频繁的字母是最后的。这是我到目前为止所写的内容:

def analyze(codebook):
lst = []
tmp = []
for c in ascii_lowercase:
   count = codebook.count(c)
   print("('%s', %d)" % (c.upper(), count), end=" ")
   lst.append(count)
   tmp.append(c)
lst.sort()
lst.reverse()
print(lst)
print(tmp)

所有这一切都是按字母顺序打印字母和相应的频率。以下是我正在使用的字符串示例:

pooiiiuuuuyyyyyttttttrrrrrrreeeeeeeewwwwwwwwwqqqqqqqqqq

这是我想要的输出的一个例子:

('Q', 10) ('W', 9) ('E', 8) ('R', 7) ('T', 6) ('Y', 5) ('U', 4) ('I', 3)

('O', 2) ('P', 1) ('A', 0) ('B', 0) ('C', 0) ('D', 0) ('F', 0) ('G', 0)

('H', 0) ('J', 0) ('K', 0) ('L', 0) ('M', 0) ('N', 0) ('S', 0) ('V', 0)

('X', 0) ('Z', 0)

我也尝试过使用收藏品;然而,这很快就引起了混乱。任何帮助将不胜感激!

5 个答案:

答案 0 :(得分:2)

您对Collections模块有正确的想法。使用计数器,这是他们的目的:

from collections import Counter
s = 'pooiiiuuuuyyyyyttttttrrrrrrreeeeeeeewwwwwwwwwqqqqqqqqqq'
c = Counter(s)
c
Counter({'q': 10, 'w': 9, 'e': 8, 'r': 7, 't': 6, 'y': 5, 'u': 4, 'i': 3, 'o': 2, 'p': 1})

最常订购的已经订购了。

如果您需要将结果作为有序元组:

c.most_common()

[('q', 10),
 ('w', 9),
 ('e', 8),
 ('r', 7),
 ('t', 6),
 ('y', 5),
 ('u', 4),
 ('i', 3),
 ('o', 2),
 ('p', 1)]

答案 1 :(得分:0)

如果您想在没有(边际)导入的情况下进行此操作,地图是您最好的朋友。

from string import ascii_lowercase

s = "pooiiiuuuuyyyyyttttttrrrrrrreeeeeeeewwwwwwwwwqqqqqqqqqq"
m = {}
for char in s:
    if char not in m:
        m[char] = 0
    m[char] += 1
# Make your list of tuples
res = []
for c in ascii_lowercase:
    res.append((c.upper(), m[c] if c in m else 0))

答案 2 :(得分:0)

另一种解决方案如下:

import collections
st = "pooiiiuuuuyyyyyttttttrrrrrrreeeeeeeewwwwwwwwwqqqqqqqqqq"
my_list = list(collections.Counter(list(st)).items())
my_list.sort(key=lambda x: x[1], reverse=True)
print(my_list)

<强>输出:

[('q', 10), ('w', 9), ('e', 8), ('r', 7), ('t', 6), ('y', 5), ('u', 4), ('i', 3), ('o', 2), ('p', 1)]

答案 3 :(得分:0)

解决方案有点长,但你可以修改它。以下是您所需输出的代码:

import collections
import string
word = "pooiiiuuuuyyyyyttttttrrrrrrreeeeeeeewwwwwwwwwqqqqqqqqqq"
wordpair = collections.Counter(word)
# Getting Word Characters
wordchar = list(wordpair.keys())
wordchar = wordchar[::-1]
# Getting Word Count
wordcount = list(wordpair.values())
wordcount = wordcount[::-1]

for w,c in zip(wordchar,wordcount):
    print("('%s', %d)" % (w.upper(), c), end=" ")

for char in string.ascii_lowercase:
    if char not in wordchar:
        print("('%s', %d)" % (char.upper(), 0), end=" ")

输出:

('Q', 10) ('W', 9) ('E', 8) ('R', 7) ('T', 6) ('Y', 5) ('U', 4) ('I', 3) 
('O', 2) ('P', 1) ('A', 0) ('B', 0) ('C', 0) ('D', 0) ('F', 0) ('G', 0) 
('H', 0) ('J', 0) ('K', 0) ('L', 0) ('M', 0) ('N', 0) ('S', 0) ('V', 0) 
('X', 0) ('Z', 0) 

答案 4 :(得分:0)

使用collections.Counterstring.ascii_lowercasesorted

from collections import Counter
from string import ascii_lowercase
s = 'pooiiiuuuuyyyyyttttttrrrrrrreeeeeeeewwwwwwwwwqqqqqqqqqq'
sorted([(i.upper(),Counter(s)[i]) for i in ascii_lowercase], key=lambda x:x[1], reverse=True)

或者不使用集合

sorted([(i.upper(), s.count(i)) for i in ascii_lowercase], key=lambda x:x[1], reverse=True)

输出:

[('Q', 10),
 ('W', 9),
 ('E', 8),
 ('R', 7),
 ('T', 6),
 ('Y', 5),
 ('U', 4),
 ('I', 3),
 ('O', 2),
 ('P', 1),
 ('A', 0),
 ('B', 0),
 ('C', 0),
 ('D', 0),
 ('F', 0),
 ('G', 0),
 ('H', 0),
 ('J', 0),
 ('K', 0),
 ('L', 0),
 ('M', 0),
 ('N', 0),
 ('S', 0),
 ('V', 0),
 ('X', 0),
 ('Z', 0)]