我有一个网址http://www.vbb.de/de/datei/GTFS_VBB_Nov2015_Dez2016.zip,可以将我重定向到http://images.vbb.de/assets/ftp/file/286316.zip。重定向引号,因为python说没有重定向:
In [51]: response = requests.get('http://www.vbb.de/de/datei/GTFS_VBB_Nov2015_Dez2016.zip')
...: if response.history:
...: print "Request was redirected"
...: for resp in response.history:
...: print resp.status_code, resp.url
...: print "Final destination:"
...: print response.status_code, response.url
...: else:
...: print "Request was not redirected"
...:
Request was not redirected
状态代码也是200. response.history
什么都没有。 response.url
给出第一个网址而不是真实网址。但是有可能在firefox中获得真正的网址 - >开发人员工具 - >网络。我如何在python 2.7中制作?提前致谢!!
答案 0 :(得分:1)
您需要首先通过解析第一个返回的HTML中的新window.location.href
来手动执行重定向。然后,这会创建一个301
回复,其中包含返回的Location
标头中包含的目标文件的名称:
import requests
import re
import os
base_url = 'http://www.vbb.de'
response = requests.get(base_url + '/de/datei/GTFS_VBB_Nov2015_Dez2016.zip')
manual_redirect = base_url + re.findall('window.location.href\s+=\s+"(.*?)"', response.text)[0]
response = requests.get(manual_redirect, stream=True)
target_filename = response.history[0].headers['Location'].split('/')[-1]
print "Downloading: '{}'".format(target_filename)
with open(target_filename, 'wb') as f_zip:
for chunk in response.iter_content(chunk_size=1024):
f_zip.write(chunk)
这会显示:
Downloading: '286316.zip'
并生成一个29,464,299字节的zip文件。
答案 1 :(得分:0)
您可以使用BeautifulSoup读取HTML页面标题中的元标记,并获取重定向网址。
>>> import requests
>>> from bs4 import BeautifulSoup
>>> a = requests.get("http://www.vbb.de/de/datei/GTFS_VBB_Nov2015_Dez2016.zip")
>>> soup = BeautifulSoup(a.text, 'html.parser')
>>> soup.find_all('meta', attrs={'http-equiv': lambda x:x.lower() == 'refresh'})[0]['content'].split('URL=')[1]
'/de/download/GTFS_VBB_Nov2015_Dez2016.zip'
此网址将相对于原始网址的域名,从而生成新的网址http://www.vbb.de/de/download/GTFS_VBB_Nov2015_Dez2016.zip
。下载此内容似乎为我下载了ZIP文件:
>>> a = requests.get("http://www.vbb.de/de/download/GTFS_VBB_Nov2015_Dez2016.zip", stream=True)
>>> with open('test.zip', 'wb') as f:
... a.raw.decode_content = True
... shutil.copyfileobj(a.raw, f)
...
$ unzip -l test.zip
Archive: test.zip
Length Date Time Name
--------- ---------- ----- ----
5554 2015-11-20 15:17 agency.txt
2151517 2015-11-20 15:17 calendar_dates.txt
71731 2015-11-20 15:17 calendar.txt
65424 2015-11-20 15:17 routes.txt
816498 2015-11-20 15:17 stops.txt
196020096 2015-11-20 15:17 stop_times.txt
365499 2015-11-20 15:17 transfers.txt
11765292 2015-11-20 15:17 trips.txt
113 2015-11-20 15:17 logging
--------- -------
211261724 9 files
在此重定向上,返回了301状态:
>>> a.history
[<Response [301]>]
>>> a
<Response [200]>
>>> a.history[0]
<Response [301]>
>>> a.history[0].url
'http://www.vbb.de/de/download/GTFS_VBB_Nov2015_Dez2016.zip'
>>> a.url
'http://images.vbb.de/assets/ftp/file/286316.zip'