Question

我正在尝试从具有https：//链接的网站下载zip文件。我尝试了以下内容，但似乎无法获得任何输出。有人可以建议我可能做错了吗？

URL = www.somewebsite.com

下载zip文件= www.somewebsite.com/output/revisionId=40687821$$Xiiy75&action_id =

import requests
url = 'http://somewebsite.org'
user, password = 'bob', 'I love cats'
resp = requests.get(url, auth=(user, password))

Answer 1

要从不受保护的网址下载文件，请执行以下操作：

import requests
url = 'http://somewebsite.org'
user, password = 'bob', 'I love cats'
resp = requests.get(url, auth=(user, password))
with open("result.zip", "wb") as fout:
    fout.write(resp.content)

如果当然，您应该在写入zip文件之前检查是否收到了有效的回复。

对于相当多具有登录名的网站，可以使用以下食谱：但是，如果asite.com使用太多的javascript，则可能不一定有效。

使用请求会话以存储任何会话cookie并执行以下三个步骤。

获取登录网址。这将获得潜在的会话cookie或CSRF保护cookie
使用用户名和密码发布到登录URL。要发布的表单的名称取决于页面。在调试模式下使用网络浏览器了解必须发布的正确值，这可以是比用户名和密码更多的参数
列表项
获取文档网址并将结果保存到文件中。

例如，在Firefox上，您转到要登录的网站，按F12（用于调试模式），单击“网络”选项卡，然后重新加载。你可能

填写登录表单，然后提交并在调试面板中查找POST请求。

通用python代码如下所示。导入请求

def login_and_download():
    ses = requests.session()

    # Step 1 get the login page
    rslt = ses.get("https://www.asite.com/login-home")
    # now any potentially required cookie will be set

    if rslt.status_code != 200:
        print("failed getting login page")
        return False

    # for simple pages you can procedd to login
    # for a little more complicated pages you might have to parse the
    # HTML
    # for really annoying pages that use loads of javascript it might be
    # even more complicated


    # Step 2 perform a post request to login
    login_post_url = # This depends on the site you want to connect to. you have analyze the login
                # procedure
    rslt = ses.post(login_post_url)

    if rslt.status_code != 200:
        print("failed logging in")
        return False

    # Step 3 download the url, that you want to get.
    rslt = ses.get(url_of_your_document)
    if rslt.status_code != 200:
        print("failed fetching the file")
        return False
    with open("result.zip", "wb") as fout:
        fout.write(resp.content)

尝试通过https下载文件

1 个答案: