什么是urllib2.urlopen读取的最佳块大小?

时间:2015-02-24 12:09:39

标签: python urllib2

我正在使用这段代码下载mp3播客。

req = urllib2.urlopen(item)
CHUNK = 16 * 1024
with open(local_file, 'wb') as fp:
    while True:
        chunk = req.read(CHUNK)
        if not chunk: break
        fp.write(chunk)

哪种方法效果很好 - 但我想知道最佳下载性能的最佳块大小是多少?

如果它有所作为,请使用6位广告连接。

2 个答案:

答案 0 :(得分:2)

一个好的缓冲区大小与OS内核用于套接字缓冲区的大小相同。这样,您不会执行比您应该更多的读取。

在GNU / Linux上,可以在/proc/sys/net/core/rmem_default文件中看到套接字缓冲区大小(以字节为单位)。 您可以使用setsockopt设置SO_RCVBUF参数来增加套接字的缓冲区大小。但是,此大小受制于您的系统(/proc/sys/net/core/rmem_max),您需要管理员权限(CAP_NET_ADMIN)才能超出此限制。

此时,您可能会做一些特定于平台的事情,以获得小额收益。

然而,查看套接字的选项(参见man 7 socketonline version)以执行微优化和学习东西是个好主意。 :)

由于没有真正最佳的甜蜜点,你应该始终对任何调整进行基准测试,以检查你的变化是否真的有益。玩得开心!

答案 1 :(得分:2)

进一步扩展我对@giant_teapot的评论

我以前用于基准测试的代码是......

#!/usr/bin/env python

import time
import os
import urllib2

#5mb mp3 file
testdl = "http://traffic.libsyn.com/timferriss/Arnold_5_min_-_final.mp3" 

chunkmulti = 1
numpass = 5

while (chunkmulti < 207):
    passtime = 0
    passattempt = 1
    while (passattempt <= numpass):
        start = time.time()
        req = urllib2.urlopen(testdl)
        CHUNK = chunkmulti * 1024
        with open("test.mp3", 'wb') as fp:
            while True:
                chunk = req.read(CHUNK)
                if not chunk: break
                fp.write(chunk)
        end = time.time()
        passtime += end - start
        os.remove("test.mp3")
        passattempt += 1
    print "Chunk size multiplier ", chunkmulti , " took ", passtime / passattempt, " seconds"
    chunkmulti += 1

结果没有定论。这是第一批结果......

Chunk size multiplier  1  took  13.9629709721  seconds
Chunk size multiplier  2  took  8.01173728704  seconds
Chunk size multiplier  3  took  10.3750542402  seconds
Chunk size multiplier  4  took  7.11076325178  seconds
Chunk size multiplier  5  took  11.3685477376  seconds
Chunk size multiplier  6  took  6.86864703894  seconds
Chunk size multiplier  7  took  14.2680369616  seconds
Chunk size multiplier  8  took  7.93746650219  seconds
Chunk size multiplier  9  took  6.81188523769  seconds
Chunk size multiplier  10  took  7.54047352076  seconds
Chunk size multiplier  11  took  6.84347498417  seconds
Chunk size multiplier  12  took  7.88792568445  seconds
Chunk size multiplier  13  took  7.37244099379  seconds
Chunk size multiplier  14  took  8.15134423971  seconds
Chunk size multiplier  15  took  7.1664044857  seconds
Chunk size multiplier  16  took  10.9474172592  seconds
Chunk size multiplier  17  took  7.23868894577  seconds
Chunk size multiplier  18  took  7.66610199213  seconds

结果像这样持续达到207kb的块大小

所以我将块大小设置为6kb。可能会对抗下一个wget的基准测试......