预期结果应该是下载并保存提供的Excel文件。
该文件位于某种oracle数据库之后。该文件可以使用任何浏览器下载。 "实时HTTP标头" firefox扩展告诉我它是一个GET请求。无论如何,我已经尝试了常用技术,我总是下载" saw.dll",这是一个简单的xml文件而不是预期的Excel文件。
这是我尝试的内容:
import urllib,urlib2,shutil
url = 'http://obiee.banrep.gov.co/analytics/saw.dll?Download'
values = {
'Format' : 'excel',
'Extension' : '.xls',
'BypassCache' : 'true',
'lang' : 'es',
'NQUser' : 'publico',
'NQPassword' : 'publico',
'Path' : '/shared/Consulta Series Estadisticas desde Excel/1. IPC base 2008/1.3. Por rango de fechas/1.3.2. Por grupo de gasto',
'ViewState' : 'h09v965dvurdtkj0iuni7m1kbe',
'ContainerID' : 'o%3ago%7er%3areport',
'RootViewID' : 'go',
}
data = urllib.urlencode(values)
req = urllib2.Request(url,data)
response = urllib2.urlopen(req)
myfile = open('test.xls', 'wb')
shutil.copyfileobj(response.fp, myfile)
myfile.close()
我试过的其他代码:
import requests,shutil
response = requests.get("http://obiee.banrep.gov.co/analytics/saw.dll?Download&Format=excel&Extension=.xls&BypassCache=true&lang=es&NQUser=publico&NQPassword=publico&Path=/shared/Consulta%20Series%20Estadisticas%20desde%20Excel/1.%20IPC%20base%202008/1.3.%20Por%20rango%20de%20fechas/1.3.2.%20Por%20grupo%20de%20gasto&ViewState=h09v965dvurdtkj0iuni7m1kbe&ContainerID=o%3ago%7er%3areport&RootViewID=go",stream=True)
with open('test.xls', 'wb') as out_file:
shutil.copyfileobj(response.raw, out_file)
del response
我还尝试过其他的东西,比如使用wget,在请求和保存之间加一些延迟等。
有什么想法吗?
谢谢,最好。
答案 0 :(得分:2)
您是否尝试更改用户代理?
...
headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.95 Safari/537.36'}
requests.get(url=url, stream=True, headers=headers)
也许服务器会向不同的用户代理返回不同的响应。
答案 1 :(得分:0)
这段代码实际上对我有用:
import requests,shutil
headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.95 Safari/537.36'}
response=requests.get(url='http://obiee.banrep.gov.co/analytics/saw.dll?Download&Format=excel&Extension=.xls&BypassCache=true&lang=es&NQUser=publico&NQPassword=publico&Path=/shared/Consulta%20Series%20Estadisticas%20desde%20Excel/1.%20IPC%20base%202008/1.3.%20Por%20rango%20de%20fechas/1.3.2.%20Por%20grupo%20de%20gasto&ViewState=h09v965dvurdtkj0iuni7m1kbe&ContainerID=o%3ago%7er%3areport&RootViewID=go', stream=True, headers=headers)
with open('test.xls', 'wb') as out_file:
shutil.copyfileobj(response.raw, out_file)
del response
这是上面Jean Cassol的建议答案。 非常感谢