Python读取和写入文件:write()vs writelines()性能

时间:2017-04-18 01:30:25

标签: python

我有一个非常做作(虽然是现实的)例子(对我来说)一些意想不到的结果。

基本上我想读取一个大文件并将包含指定字符串的所有行写入另一个文件。

例如:

要匹配的字符串:

12348:

文件内容:

12348:12345
zxcv
xcvb
dfgh
tyu
12348:123456

写入文件:

12348:12345
12348:123456

我已经通过多种方式实现了这一点:

方法1:逐行写()

def compute_write():
    start = datetime.now()
    with open("read.txt") as fh:
        with open("write.txt", "wb") as fh2:
            for line in fh:
                if "12348:" in line:
                    fh2.write(line)
    end = datetime.now()
    elapsed = end - start
    print("compute_write {0}".format(elapsed))

方法2:带有数组的writelines()

def compute_array():
    ret = []
    start = datetime.now()
    with open("read.txt") as fh:
        for line in fh:
            if "12348:" in line:
                ret.append(line)

    with open("write.txt", "wb") as fh2:
        fh2.writelines(ret)

    elapsed = datetime.now() - start
    print("compute_array {0}".format(elapsed))

方法3:具有生成器功能的writelines()

def generator_fn(fh):
    for line in fh:
        if "12348:" in line:
            yield line

def compute_gen():
    start = datetime.now()
    with open("read.txt", "r") as fh:
        with open("write.txt", "wb") as fh2:
            fh2.writelines(generator_fn(fh))
    elapsed = datetime.now() - start
    print("compute_gen {0}".format(elapsed))

现在为了使它超现实,我重复每次计算20次并计算所需的总时间。结果:( read.txt约为700MB,写入〜130MB到write.txt)

`compute_write()` --> 16.159134s
`compute_array()` --> 12.453268s
`compute_gen()`   --> 15.374717s

基本上为什么compute_array()比其他实现快25%?我做错了什么/错过了什么?

0 个答案:

没有答案