Python:下载一个抵制常用技术的文件

时间:2016-11-11 16:55:41

标签: python download obiee

我正在尝试编写一个python代码来下载并保存此url中的文件: http://obiee.banrep.gov.co/analytics/saw.dll?Download&Format=excel&Extension=.xls&BypassCache=true&lang=es&NQUser=publico&NQPassword=publico&Path=/shared/Consulta%20Series%20Estadisticas%20desde%20Excel/1.%20IPC%20base%202008/1.3.%20Por%20rango%20de%20fechas/1.3.2.%20Por%20grupo%20de%20gasto&ViewState=h09v965dvurdtkj0iuni7m1kbe&ContainerID=o%3ago%7er%3areport&RootViewID=go

预期结果应该是下载并保存提供的Excel文件。

该文件位于某种oracle数据库之后。该文件可以使用任何浏览器下载。 "实时HTTP标头" firefox扩展告诉我它是一个GET请求。无论如何,我已经尝试了常用技术,我总是下载" saw.dll",这是一个简单的xml文件而不是预期的Excel文件。

这是我尝试的内容:

 import urllib,urlib2,shutil

 url = 'http://obiee.banrep.gov.co/analytics/saw.dll?Download'
 values = {
   'Format' : 'excel',
   'Extension' : '.xls',
   'BypassCache' : 'true',
   'lang' : 'es',
   'NQUser' : 'publico',
   'NQPassword' : 'publico',
   'Path' : '/shared/Consulta Series Estadisticas desde Excel/1. IPC base 2008/1.3. Por rango de fechas/1.3.2. Por grupo de gasto',
   'ViewState' : 'h09v965dvurdtkj0iuni7m1kbe',
   'ContainerID' : 'o%3ago%7er%3areport',
   'RootViewID' : 'go',
}

data = urllib.urlencode(values)
req = urllib2.Request(url,data)
response = urllib2.urlopen(req)
myfile = open('test.xls', 'wb')
shutil.copyfileobj(response.fp, myfile)
myfile.close()

我试过的其他代码:

import requests,shutil

response = requests.get("http://obiee.banrep.gov.co/analytics/saw.dll?Download&Format=excel&Extension=.xls&BypassCache=true&lang=es&NQUser=publico&NQPassword=publico&Path=/shared/Consulta%20Series%20Estadisticas%20desde%20Excel/1.%20IPC%20base%202008/1.3.%20Por%20rango%20de%20fechas/1.3.2.%20Por%20grupo%20de%20gasto&ViewState=h09v965dvurdtkj0iuni7m1kbe&ContainerID=o%3ago%7er%3areport&RootViewID=go",stream=True)

with open('test.xls', 'wb') as out_file:
    shutil.copyfileobj(response.raw, out_file)
del response

我还尝试过其他的东西,比如使用wget,在请求和保存之间加一些延迟等。

有什么想法吗?

谢谢,最好。

2 个答案:

答案 0 :(得分:2)

您是否尝试更改用户代理?

...
headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.95 Safari/537.36'}
requests.get(url=url, stream=True, headers=headers)

也许服务器会向不同的用户代理返回不同的响应。

答案 1 :(得分:0)

这段代码实际上对我有用:

import requests,shutil

headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.95 Safari/537.36'}
response=requests.get(url='http://obiee.banrep.gov.co/analytics/saw.dll?Download&Format=excel&Extension=.xls&BypassCache=true&lang=es&NQUser=publico&NQPassword=publico&Path=/shared/Consulta%20Series%20Estadisticas%20desde%20Excel/1.%20IPC%20base%202008/1.3.%20Por%20rango%20de%20fechas/1.3.2.%20Por%20grupo%20de%20gasto&ViewState=h09v965dvurdtkj0iuni7m1kbe&ContainerID=o%3ago%7er%3areport&RootViewID=go', stream=True, headers=headers)
with open('test.xls', 'wb') as out_file:
    shutil.copyfileobj(response.raw, out_file)
del response

这是上面Jean Cassol的建议答案。 非常感谢