我有一个非常做作(虽然是现实的)例子(对我来说)一些意想不到的结果。
基本上我想读取一个大文件并将包含指定字符串的所有行写入另一个文件。
例如:
要匹配的字符串:
12348:
文件内容:
12348:12345 zxcv xcvb dfgh tyu 12348:123456
写入文件:
12348:12345 12348:123456
我已经通过多种方式实现了这一点:
方法1:逐行写()
def compute_write():
start = datetime.now()
with open("read.txt") as fh:
with open("write.txt", "wb") as fh2:
for line in fh:
if "12348:" in line:
fh2.write(line)
end = datetime.now()
elapsed = end - start
print("compute_write {0}".format(elapsed))
方法2:带有数组的writelines()
def compute_array():
ret = []
start = datetime.now()
with open("read.txt") as fh:
for line in fh:
if "12348:" in line:
ret.append(line)
with open("write.txt", "wb") as fh2:
fh2.writelines(ret)
elapsed = datetime.now() - start
print("compute_array {0}".format(elapsed))
方法3:具有生成器功能的writelines()
def generator_fn(fh):
for line in fh:
if "12348:" in line:
yield line
def compute_gen():
start = datetime.now()
with open("read.txt", "r") as fh:
with open("write.txt", "wb") as fh2:
fh2.writelines(generator_fn(fh))
elapsed = datetime.now() - start
print("compute_gen {0}".format(elapsed))
现在为了使它超现实,我重复每次计算20次并计算所需的总时间。结果:( read.txt约为700MB,写入〜130MB到write.txt)
`compute_write()` --> 16.159134s `compute_array()` --> 12.453268s `compute_gen()` --> 15.374717s
基本上为什么compute_array()
比其他实现快25%?我做错了什么/错过了什么?