我正在处理任何.txt文件的霍夫曼编码,所以首先我需要分析这个文本文件。我需要阅读它,然后分析。 我需要像表一样“退出”:
信|频率(后者重复多少次)|霍夫曼代码(稍后会出现)
我开始时:
f = open('test.txt', 'r') #open test.tx
for lines in f:
print lines #to ensure if all work...
如何按字母顺序从文件中订购字符:
with open("test.txt") as f_in:
for line in f_in:
for char in line:
frequencies[char] += 1
非常感谢
Well I tried like this:
frequencies = collections.defaultdict(int)
with open("test.txt") as f_in:
for line in f_in:
for char in line:
frequencies[char] += 1
frequencies = [(count, char) for char, count in frequencies.iteritems()]
frequencies.sort(key=operator.itemgetter(1))
但是编译器给我一个“错误” 在这里输入代码
我需要这个字母顺序 in for loop ,而不是频道结束...
答案 0 :(得分:2)
要获得您的频率表,我会使用defaultdict
。这只会迭代数据一次。
import collections
import operator
frequencies = collections.defaultdict(int)
with open(filename) as f_in:
for line in f_in:
for char in line:
frequencies[char] += 1
frequencies = [(count, char) for char, count in frequencies.iteritems()]
frequencies.sort(key=operator.itemgetter(1))
答案 1 :(得分:0)
with open('test.txt') as f: data = f.read()
table = dict((c, data.count(c)) for c in set(data))
答案 2 :(得分:0)
我使用collections.Counter()
:
import re
import collections
if __name__ == '__main__':
is_letter = re.compile('[A-Za-z]')
frequencies = collections.Counter()
with open(r'text.txt') as f_in:
for line in f_in:
for char in line:
if is_letter.match(char):
frequencies[char.lower()] += 1
# Sort characters
characters = [x[0] for x in frequencies.most_common()]
characters.sort()
for c in characters:
print c, '|', str(frequencies[c])
正则表达式is_letter
用于仅过滤我们感兴趣的字符。
它提供的输出看起来像这样。
a | 177
b | 29
c | 7
d | 167
e | 374
f | 58
g | 100
h | 44
i | 135
j | 21
k | 64
l | 125
m | 85
n | 191
o | 105
p | 34
r | 185
s | 130
t | 146
u | 34
v | 68
x | 1
y | 14