对于长行文件,使用codecs.open(filename, 'r', 'utf8')
读取文件的速度非常慢。这种行为有望吗?或者我做错了什么?
下面是一个简短的测试代码,我首先只用./data/longfile
读取文件open
,然后用codecs.open('r', 'utf8')
读取时间差非常大!
import time
vf = './data/longfile'
st = time.time()
with open(vf, 'r') as vfile:
for line_vf in vfile:
print len(line_vf),
print 'done'
print 'time:', time.time() - st
import codecs
st = time.time()
with codecs.open(vf, 'r', 'utf8') as vfile:
for line_vf in vfile:
print len(line_vf),
print 'done'
print 'time with codecs-utf8:', time.time() - st
输出:
130013 1667401 33266 done #num chars per line
time: 0.00615406036377 #fast
130013 1667401 33266 done #num chars per line
time with codecs-utf8: 0.418385028839 #---> 70X slower!!