应用错误收集

对于长行文件，使用codecs.open(filename, 'r', 'utf8')读取文件的速度非常慢。这种行为有望吗？或者我做错了什么？

下面是一个简短的测试代码，我首先只用./data/longfile读取文件open，然后用codecs.open('r', 'utf8')读取时间差非常大！

import time
vf = './data/longfile'
st = time.time()
with open(vf, 'r') as vfile:
    for line_vf in vfile:
        print len(line_vf),
print 'done'
print 'time:', time.time() - st

import codecs
st = time.time()
with codecs.open(vf, 'r', 'utf8') as vfile:
    for line_vf in vfile:
        print len(line_vf),
print 'done'
print 'time with codecs-utf8:', time.time() - st

输出：

130013 1667401 33266 done #num chars per line
time: 0.00615406036377 #fast
130013 1667401 33266 done #num chars per line
time with codecs-utf8: 0.418385028839 #---> 70X slower!!

使用`utf8`编码的python`codecs.open`在读取长行时非常慢

0 个答案: