我试图找到一种方法来计算文本文件中字母的出现次数,而不是根据频率显示从最高到最低的字母。这就是我到目前为止,请帮助克服这个脑块。
def me():
info= input("what file would you like to select?")
filehandle= open(info,"r")
data=filehandle.read()
case = data.upper()
s=('ABCDEFGHIJKLMNOPQRSTUVWXYZ')
for i in range(26):
print(s[i],case.count(s[i]))
me()
答案 0 :(得分:2)
Python有一个很好的内置类:collections.Counter
。
In [8]: from collections import Counter
In [9]: with open('Makefile', 'r') as f:
...: raw = Counter(f.read())
...:
In [10]: raw
Out[10]: Counter({' ': 61, 'e': 46, 'p': 38, 'a': 29, '\n': 27, 'c': 27, 'n': 27, 'l': 26, 'd': 25, '-': 22, 's': 22, 'y': 22, 't': 20, 'i': 18, 'o': 18, 'r': 17, '.': 16, 'u': 13, '\t': 12, 'm': 12, 'b': 11, 'x': 10, 'h': 9, '/': 8, ':': 8, '_': 7, "'": 6, ';': 5, '\\': 5, 'f': 5, '*': 3, 'v': 3, '{': 3, '}': 3, 'k': 2, 'H': 1, 'O': 1, 'N': 1, 'P': 1, 'Y': 1, 'g': 1})
这是来自pandas
图书馆的Makefile
,BTW。要按降序按频率对它们进行排序,请执行以下操作:
In [22]: raw.most_common()
Out[22]:
[(' ', 61),
('e', 46),
('p', 38),
('a', 29),
('\n', 27),
('c', 27),
('n', 27),
('l', 26),
('d', 25),
('-', 22),
('s', 22),
('y', 22),
('t', 20),
('i', 18),
('o', 18),
('r', 17),
('.', 16),
('u', 13),
('\t', 12),
('m', 12),
('b', 11),
('x', 10),
('h', 9),
('/', 8),
(':', 8),
('_', 7),
("'", 6),
(';', 5),
('\\', 5),
('f', 5),
('*', 3),
('v', 3),
('{', 3),
('}', 3),
('k', 2),
('H', 1),
('O', 1),
('N', 1),
('P', 1),
('Y', 1),
('g', 1)]
我故意不使用您的确切数据,以便您可以尝试使我的解决方案适应您的问题。
答案 1 :(得分:2)
这正是collections.Counter
及其most_common()
方法的用途:
import collections
import string
def me():
info = input("what file would you like to select? ")
filehandle = open(info, "r")
data = filehandle.read().upper()
char_counter = collections.Counter(data)
for char, count in char_counter.most_common():
if char in string.ascii_uppercase:
print(char, count)
me()
Counter
是一个字典,用于计算不同项目(在本例中为字符)的出现次数。 char_counter.most_common()
按排序顺序为我们提供所有字符和计数对。
我们只对字母感兴趣,所以我们检查字符是否在string.ascii_uppercase
。这只是一串从A到Z的字母。
答案 2 :(得分:0)
这看起来非常非常好。我希望你正确使用这个网站。 但是,很高兴你来对地方,我会尽量帮助你,至少这一次。
>>> from collections import defaultdict
>>> d = defaultdict(int)
>>> input_txt = "Now you are just somebody that I used to know"
>>> for letter in input_txt:
... d[letter] += 1
...
>>> import operator
>>> sorted_d = sorted(d.iteritems(), key=operator.itemgetter(1), reverse=True)
>>> sorted_d
[(' ', 9), ('o', 6), ('t', 4), ('e', 3), ('s', 3), ('u', 3), ('a', 2), ('d', 2), ('w', 2), ('y', 2), ('b', 1), ('I', 1), ('h', 1), ('k', 1), ('j', 1), ('m', 1), ('N', 1), ('r', 1), ('n', 1)]
答案 3 :(得分:0)
你可以沿着这些方向做点什么:
d={}
with open('/usr/share/dict/words') as f:
for line in f:
for word in line.split():
word=word.strip()
for c in word:
d[c]=d.setdefault(c,0)+1
for k, v in sorted(d.items(), key=lambda t: t[1], reverse=True):
print k,v
对于标准的Unix单词文件,打印:
e 234413
i 200536
a 196995
o 170062
r 160269
...
Y 139
X 92
Q 77
- 2
答案 4 :(得分:0)
其他人已经使用itertools.Counter
为您提供了更好的解决方案,但您的代码已经接近了;你无法即时打印排序的输出。您可以将计数保存在列表中,对其进行排序然后打印:
def me():
info = input("what file would you like to select?")
filehandle = open(info,"r")
data = filehandle.read()
case = data.upper()
s = ('ABCDEFGHIJKLMNOPQRSTUVWXYZ')
result = []
for i in range(26):
result.append((s[i], case.count(s[i])))
return result
result = me()
for letter, count in sorted(result, key=lambda x: x[1], reverse=True):
print(letter, count)
仍在使用您的逻辑,您可以使该功能更具可读性:
import string
def me():
info = input("what file would you like to select?")
filehandle = open(info,"r")
data = filehandle.read()
case = data.upper()
result = []
for letter in string.uppercase:
result.append((letter, case.count(letter)))
return result