Question

如何使用python3将文件从一个字节下载到另一个字节（例如，下载html文件的页脚）？

感谢。

Answer 1

如果服务器发回Accept-Ranges: bytes标头，则表示您可以通过设置Range标头来请求特定范围，如Range: bytes=1170-1246此处所示：

~: curl -v http://example.com/ -r 1170-1246
* Hostname was NOT found in DNS cache
*   Trying 93.184.216.119...
* Connected to example.com (93.184.216.119) port 80 (#0)
> GET / HTTP/1.1
> Range: bytes=1170-1246
> User-Agent: curl/7.37.0
> Host: example.com
> Accept: */*
> 
< HTTP/1.1 206 Partial Content
< Accept-Ranges: bytes
< Cache-Control: max-age=604800
< Content-Range: bytes 1170-1246/1270
< Content-Type: text/html
< Date: Tue, 03 Jun 2014 16:37:10 GMT
< Etag: "359670651"
< Expires: Tue, 10 Jun 2014 16:37:10 GMT
< Last-Modified: Fri, 09 Aug 2013 23:54:35 GMT
< Server: ECS (sea/F622)
< Connection: Keep-Alive
< 
<p><a href="http://www.iana.org/domains/example">More information...</a></p>
* Connection #0 to host example.com left intact

如果没有，您只需要请求整个页面并使用通常的Python切片。

您可以先发送Range标题而不先检查Accept-Ranges;只需确保区分200和206的响应。

这只有在您确切知道所需的字节时才有效。

Answer 2

在python中你可以做类似的事情：

import urllib2

def read_range(url, rstart, rstop):
    # First request url
    response = urllib2.urlopen(url)

    # Ignore content to start byte
    response.read(max(rstart-1,0))

    # Read bytes we want.
    return response.read(rstop - rstart)

# First 200 bytes.
print read_range("http://stackoverflow.com", 0, 200)

# Last 200 bytes
print read_range("http://stackoverflow.com", 200, 0)

# Some bytes right in the middle
print read_range("http://stackoverflow.com", 400, 1000)

# Read whole file
print read_range("http://stackoverflow.com", 0, -1)

这当然会发送一个没有任何特殊标题的200请求，但仍然可以满足您的要求。

如何将文件从一个字节下载到另一个字节。（例如下载html文件的页脚）

2 个答案: