Question

我尝试使用requests从网址下载图片。使用浏览器或REST客户端，如restlet chrome extension，我可以检索正常内容，json和可以保存到磁盘的二进制映像。

使用requests作为响应结果我得到了几乎相同的响应头，只有Content-Length具有不同的值--15个字节而不是35千字节 - 我无法找到二进制图像。< / p>

尝试模拟浏览器发出的请求我配置了相同的请求标头，如下所示：

headers = {"Host": "cpom.prefeitura.sp.gov.br",
           "Pragma": "no-cache",
           "Cache-Control": "no-cache",
           "DNT": "1",
           "Accept": "*/*",
           "Accept-Encoding": "gzip, deflate, br",
           "Accept-Language": "en-US,en;q=0.9,pt;q=0.8",
           "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) "
                         "AppleWebKit/537.36 (KHTML, like Gecko) "
                         "Chrome/65.0.3325.181 Safari/537.36"
           }

r = requests.get(url, stream=True, headers=headers)

没有重定向，我也调试并查看requests.model.Response的内容，但没有成功。

我失踪了什么？我认为这是关于请求的详细信息，但我无法得到它。

我的测试：

url = "https://cpom.prefeitura.sp.gov.br/prestador/SituacaoCadastral/ImagemCaptcha?u=8762520"
r = requests.get(url, stream=True)

if r.status_code == 200:
    print(r.raw.headers)
    with open("/home/bruno/captcha/8762520.txt", "wb") as f:  # saving as text, since is not the png image
        for chunk in r:
            f.write(chunk)

这是下载图片的网址：https://cpom.prefeitura.sp.gov.br/prestador/SituacaoCadastral/ImagemCaptcha?u=4067913

这是带有验证码图片的网站：https://cpom.prefeitura.sp.gov.br/prestador/SituacaoCadastral

使用简单的GET将只获得一个json响应体，但检查响应，您将看到二进制响应，即图像 - 约36kb大小。

编辑：包含来自重定向客户端的图片

请求：

响应：

Answer 1

区别在于Cookie标题。 Restlet默认使用现有的Chrome Cookie（see docs），但如果您将Cookie标题设置为空字符串，则会看到您没有获得该图像。我希望能够从Python脚本中检索图像，您需要首先获得一个有效的cookie，向Web应用程序中的另一个有效URL发出请求（例如，您发布的表单的链接）并查看Set-Cookie（有关详细信息，请参阅MDN docs。）

使用Python请求的不同响应

1 个答案: