我正在编写一个函数,它接受一个in_file并检查该文件中字母的频率,并以这种格式(字母:频率)写入out_file。这是我到目前为止所有人都可以帮助的吗?
def count_letters(in_file,out_file):
in_file = open(in_file,"r")
out_file = open(out_file,"w")
for line in in_file:
words = line.split()
for word in words:
for letter in word:
print(letter,':',line.count(letter),file=out_file,end="\n")
答案 0 :(得分:4)
根本没有必要分词;直接将字符串传递给计数器会更新每个字符的计数。您还需要先收集的所有计数,然后才将它们写入输出文件:
from collections import Counter
def count_letters(in_filename, out_filename):
counts = Counter()
with open(in_filename, "r") as in_file:
for chunk in iter(lambda: in_file.read(8196), ''):
counts.update(chunk)
with open(out_filename, "w") as out_file:
for letter, count in counts.iteritems():
out_file.write('{}:{}\n'.format(letter, count)
请注意,输入文件是以8kb块而不是一次性处理的;您可以调整块大小(最好是2的幂)以最大化吞吐量。
如果您希望输出文件按频率(降序)排序,则可以在此处使用.most_common()
而不是.iteritems()
。
答案 1 :(得分:0)
这应该可以解决问题 - 它会计算所有字符,而不仅仅是字母:
def count_letters(in_file,out_file):
from collections import Counter
in_file = open(in_file,"r")
out_file = open(out_file,"w")
letter_counts = Counter()
with open(in_file, 'r') as in_file:
for line in in_file:
line = line.strip()
for letter in line:
# Count only letters.
if not letter.isalpha():
continue
letter_counts[letter] += 1
with open(out_file, 'w') as out_file:
for letter, count in letter_counts.iteritems():
out_file.write('{}:{}\n'.format(letter, count))