这是我第一次在Stack Overflow上提问,所以如果我的问题太模糊或提供的信息不够,我会提前道歉。
基本上我遇到的问题是我的代码因TypeError而无法运行。
import string
f = open('data/hamlet.txt', 'r')
text = f.read()
alphabet_freq = []
for c in string.ascii_lowercase :
alphabet_freq.append(text.count(c) + text.count(c.upper()))
alphabet_freq_sum = 0
for _ in alphabet_freq :
alphabet_freq_sum +=_
letter_frequency = []
for _ in alphabet_freq :
letter_frequency.append(( _ / alphabet_freq_sum) * 100)
alphabets = list(string.ascii_lowercase)
letter_frequency_in_freq_order = []
for _ in letter_frequency :
letter_frequency_in_freq_order.append(letter_frequency.pop(max(letter_frequency)))
print(letter_frequency_in_freq_order,letter_frequency)
stacktrace
**make: *** [py3_run] 오류 1
Traceback (most recent call last):
File "Main.out", line 26, in <module>
letter_frequency_in_freq_order.append(letter_frequency.pop(max(letter_frequency)))
TypeError: integer argument expected, got float**
我认为fucntion max在浮动时不起作用。对?
答案 0 :(得分:1)
pop
获取从列表中删除的索引,并返回此索引上的值。
你没有给它一个索引 - 但是最大值 - 列表中的索引当然必须是整数。
你可以用.find()
来解决问题 - 或者过度思考你的整个方法:
有时使用python库的某些知识更容易做事。
这是一个很好的计数专业字典,它会传递你的整个文本一次,并为每个字符添加/增加其键。你正在调用.count()
26次 - 对于string.ascii_lowercase中的每个字符一次 - 每次遍历整个字符串以计算其中出现一次字符的时间。
from collections import Counter
import string
t = """This is not a hamlet text, just a few letters and words.
Including newlines to demonstrate how to optimize your approach.
Of counting character frequencies..."""
# if you want all characters starting with a zero count, you can add and remove them
# if you do not need all characters, skip this step and go to c.update()
c = Counter(string.ascii_lowercase) # all have count of 1
c.subtract(string.ascii_lowercase) # now all are present with count of 0
# count all characters, passes text once
c.update(t.lower()) # you can input your file.read() here, any iterable will do
# sum all character counts
totalSum = sum(c.values())
# get the countings ordered max to min as tuples (Char,Count), modify
# with list comprehension to your float values. They are still ordered
top_n = [ (a,b/totalSum) for a,b in c.most_common()]
# use this to strip f.e. .,! and \n from the output:
# top_n = [ (a,b/totalSum) for a,b in c.most_common() if a in string.ascii_lowercase]
import pprint
pprint.pprint(c)
pprint.pprint(top_n)
输出:
Counter({' ': 22, 't': 15, 'e': 14, 'o': 11,
'n': 10, 'a': 9, 'i': 9, 'r': 8,
's': 8, 'c': 6, 'h': 5, 'u': 5,
'.': 5, 'd': 4, 'l': 4, 'w': 4,
'f': 3, 'm': 3, 'p': 3, 'g': 2,
'\n': 2, 'j': 1, 'q': 1, 'x': 1,
'y': 1, 'z': 1, ',': 1, 'b': 0,
'k': 0, 'v': 0})
[(' ', 0.13924050632911392), ('t', 0.0949367088607595),
('e', 0.08860759493670886), ('o', 0.06962025316455696),
('n', 0.06329113924050633), ('a', 0.056962025316455694),
('i', 0.056962025316455694), ('r', 0.05063291139240506),
('s', 0.05063291139240506), ('c', 0.0379746835443038),
('h', 0.03164556962025317), ('u', 0.03164556962025317),
('.', 0.03164556962025317), ('d', 0.02531645569620253),
('l', 0.02531645569620253), ('w', 0.02531645569620253),
('f', 0.0189873417721519), ('m', 0.0189873417721519),
('p', 0.0189873417721519), ('g', 0.012658227848101266),
('\n', 0.012658227848101266), ('j', 0.006329113924050633),
('q', 0.006329113924050633), ('x', 0.006329113924050633),
('y', 0.006329113924050633), ('z', 0.006329113924050633),
(',', 0.006329113924050633),
('b', 0.0), ('k', 0.0), ('v', 0.0)]