使用`utf8`编码的python`codecs.open`在读取长行时非常慢

时间:2018-01-31 00:14:14

标签: python utf-8 codec

对于长行文件,使用codecs.open(filename, 'r', 'utf8')读取文件的速度非常慢。这种行为有望吗?或者我做错了什么?

下面是一个简短的测试代码,我首先只用./data/longfile读取文件open,然后用codecs.open('r', 'utf8')读取时间差非常大!

import time
vf = './data/longfile'
st = time.time()
with open(vf, 'r') as vfile:
    for line_vf in vfile:
        print len(line_vf),
print 'done'
print 'time:', time.time() - st

import codecs
st = time.time()
with codecs.open(vf, 'r', 'utf8') as vfile:
    for line_vf in vfile:
        print len(line_vf),
print 'done'
print 'time with codecs-utf8:', time.time() - st

输出:

130013 1667401 33266 done #num chars per line
time: 0.00615406036377 #fast
130013 1667401 33266 done #num chars per line
time with codecs-utf8: 0.418385028839 #---> 70X slower!!

0 个答案:

没有答案