python open buffering和file.read大小参数之间是否存在关联

时间:2016-03-03 16:24:54

标签: python python-2.7

我正在尝试优化服务服务器的文件处理 - 读取块,压缩并发送到客户端。我想知道python open()的缓冲参数和file.read()大小参数之间是否存在关联。

我已经创建了以下测试脚本,并使用1MB随机生成的文本以及缓冲和大小参数的各种组合运行它:

import sys
import timeit

def main():
    filename = sys.argv[1]
    buffering = int(sys.argv[2])
    chunk_size = int(sys.argv[3])

    print timeit.repeat(stmt=lambda:reading(filename, buffering, chunk_size), repeat=10, number=100)
    exit(0)


def reading(filename, buffering, chunk_size):
    content = []
    with open(filename, 'r', buffering) as f:
        while True:
            chunk = f.read(chunk_size)
            if not chunk:
                break
            process(chunk)
    return

def process(chunk):
    pass


if __name__ == "__main__":
    main()

结果:

无缓冲运行:

[me@home tmp]$ ./reading.2.py 1mb_test_file 0 -1
[0.022092103958129883, 0.020861148834228516, 0.020159006118774414, 0.020852804183959961, 0.020852804183959961, 0.021064996719360352, 0.020868062973022461, 0.021065950393676758, 0.020724058151245117, 0.019698143005371094]
[me@home tmp]$ ./reading.2.py 1mb_test_file 0 8126
[0.041762113571166992, 0.041918039321899414, 0.041845083236694336, 0.041965007781982422, 0.041656017303466797, 0.041667938232421875, 0.041767120361328125, 0.041543006896972656, 0.041823863983154297, 0.041646957397460938]
[me@home tmp]$ ./reading.2.py 1mb_test_file 0 16384
[0.031350851058959961, 0.030806779861450195, 0.030879974365234375, 0.031160116195678711, 0.030720949172973633, 0.030941009521484375, 0.030817985534667969, 0.031002044677734375, 0.030740022659301758, 0.031502008438110352]
[me@home tmp]$ ./reading.2.py 1mb_test_file 0 4096
[0.063197135925292969, 0.063369989395141602, 0.063393115997314453, 0.063482046127319336, 0.06318211555480957, 0.063271045684814453, 0.063127040863037109, 0.06345677375793457, 0.063257932662963867, 0.063141822814941406]

使用8kB缓冲参数运行:

[me@home tmp]$ ./reading.2.py 1mb_test_file 8192 -1
[0.022572994232177734, 0.020329952239990234, 0.021086215972900391, 0.021151065826416016, 0.021051883697509766, 0.021071195602416992, 0.021275043487548828, 0.021074056625366211, 0.020795106887817383, 0.020998954772949219]
[me@home tmp]$ ./reading.2.py 1mb_test_file 8192 8192
[0.042397022247314453, 0.042787075042724609, 0.042707920074462891, 0.042771100997924805, 0.042808055877685547, 0.042753934860229492, 0.042588949203491211, 0.042686223983764648, 0.042527914047241211, 0.042797088623046875]
[me@home tmp]$ ./reading.2.py 1mb_test_file 8192 16384
[0.032326936721801758, 0.032197952270507812, 0.031965017318725586, 0.031849861145019531, 0.032578945159912109, 0.032018899917602539, 0.031890869140625, 0.032000064849853516, 0.031239032745361328, 0.032066822052001953]
[me@home tmp]$ ./reading.2.py 1mb_test_file 8192 4096
[0.06818699836730957, 0.062704086303710938, 0.061124086380004883, 0.049216985702514648, 0.062672138214111328, 0.067947864532470703, 0.067732810974121094, 0.068349838256835938, 0.068238019943237305, 0.068017005920410156]

以16kB缓冲大小运行:

[me@home tmp]$ ./reading.2.py 1mb_test_file 16384 -1
[0.022460222244262695, 0.021178960800170898, 0.021303176879882812, 0.020936012268066406, 0.020729780197143555, 0.017519950866699219, 0.014748811721801758, 0.020572185516357422, 0.020045042037963867, 0.020984888076782227]
[me@home tmp]$ ./reading.2.py 1mb_test_file 16384 16384
[0.031231880187988281, 0.031037092208862305, 0.031080961227416992, 0.030995845794677734, 0.030937910079956055, 0.031276226043701172, 0.031119823455810547, 0.030817985534667969, 0.031432151794433594, 0.030987977981567383]
[me@home tmp]$ ./reading.2.py 1mb_test_file 16384 32768
[0.026186943054199219, 0.025575160980224609, 0.026786088943481445, 0.025743007659912109, 0.025722026824951172, 0.025857925415039062, 0.024456977844238281, 0.024057149887084961, 0.025676965713500977, 0.025729179382324219]
[me@home tmp]$ ./reading.2.py 1mb_test_file 16384 8192
[0.054841041564941406, 0.046918153762817383, 0.046893119812011719, 0.046890020370483398, 0.046682119369506836, 0.046760082244873047, 0.04701995849609375, 0.046629905700683594, 0.047094106674194336, 0.046594142913818359]

如果我正确读取结果,最好将读取大小放大到缓冲值,并一次性读取文件(如果你能记住它)。或者,似乎使用16kB进行缓冲,32kB用于读取大小,这几乎可以让我一次性读取文件。

所以,我的问题:

  • 我的测试脚本是否正确?我需要清除缓存吗?怎么样?
  • 是否有其他方法可以获得16kB / 32kB值 - 例如从系统设置?

谢谢, 微米。

0 个答案:

没有答案