我正在尝试从具有https://链接的网站下载zip文件。我尝试了以下内容,但似乎无法获得任何输出。有人可以建议我可能做错了吗?
URL = www.somewebsite.com
下载zip文件= www.somewebsite.com/output/revisionId=40687821$$Xiiy75&action_id =
import requests
url = 'http://somewebsite.org'
user, password = 'bob', 'I love cats'
resp = requests.get(url, auth=(user, password))
答案 0 :(得分:0)
要从不受保护的网址下载文件,请执行以下操作:
import requests
url = 'http://somewebsite.org'
user, password = 'bob', 'I love cats'
resp = requests.get(url, auth=(user, password))
with open("result.zip", "wb") as fout:
fout.write(resp.content)
如果当然,您应该在写入zip文件之前检查是否收到了有效的回复。
对于相当多具有登录名的网站,可以使用以下食谱: 但是,如果asite.com使用太多的javascript,则可能不一定有效。
使用请求会话以存储任何会话cookie并执行以下三个步骤。
例如,在Firefox上,您转到要登录的网站,按F12(用于调试模式),单击“网络”选项卡,然后重新加载。 你可能
填写登录表单,然后提交并在调试面板中查找POST请求。
通用python代码如下所示。 导入请求
def login_and_download():
ses = requests.session()
# Step 1 get the login page
rslt = ses.get("https://www.asite.com/login-home")
# now any potentially required cookie will be set
if rslt.status_code != 200:
print("failed getting login page")
return False
# for simple pages you can procedd to login
# for a little more complicated pages you might have to parse the
# HTML
# for really annoying pages that use loads of javascript it might be
# even more complicated
# Step 2 perform a post request to login
login_post_url = # This depends on the site you want to connect to. you have analyze the login
# procedure
rslt = ses.post(login_post_url)
if rslt.status_code != 200:
print("failed logging in")
return False
# Step 3 download the url, that you want to get.
rslt = ses.get(url_of_your_document)
if rslt.status_code != 200:
print("failed fetching the file")
return False
with open("result.zip", "wb") as fout:
fout.write(resp.content)