Question

我想从python中的特定偏移量读取Internet上的文件。就像在普通文件处理程序中一样（由open（）返回），我们有一个seek（）api。从网络上阅读时有没有办法做到这一点。

import urllib.request
g = urllib.request.urlopen('http://tools.ietf.org/rfc/rfc2822.txt')
g.seek(20)
f=g.read(100)
print(f)

我尝试了以下操作，但 obviosuly 错误

io.UnsupportedOperation: seek

我该怎么做才能解决这个问题？

Answer 1

您可以使用Range header（仅当服务器支持时）：

import urllib.request
req = urllib.request.Request('http://tools.ietf.org/rfc/rfc2822.txt',
                             headers={'Range': 'bytes=20-'})
g = urllib.request.urlopen(req)
f = g.read(100)
print(f)

但并非所有服务器支持Range。你应该检查响应头。如果服务器不支持它，则应通过读取它们来跳过字节。

import urllib.request
req = urllib.request.Request('http://tools.ietf.org/rfc/rfc2822.txt',
                             headers={'Range': 'bytes=20-'})
g = urllib.request.urlopen(req)
if 'Content-Range' not in g.info(): # <-----
# OR  if g.status != http.client.PARTIAL_CONTENT
    g.read(20)                      # <-----
f = g.read(100)
print(f)

想要使用Python3从互联网上读取文件的特定偏移量

1 个答案: