我从How to download a file using python in a 'smarter' way?获得了此代码?
但它引发了一个错误:
in download
r.close()
UnboundLocalError: local variable 'r' referenced before assignment
另外我想添加一个条件,即要下载的文件应该只是pdf。
import urllib2
import shutil
import urlparse
import os
def download(url, fileName=None):
def getFileName(url,openUrl):
if 'Content-Disposition' in openUrl.info():
# If the response has Content-Disposition, try to get filename from it
cd = dict(map(lambda x: x.strip().split('=') if '=' in x else (x.strip(),''),openUrl.info()['Content-Disposition'].split(';')))
if 'filename' in cd:
filename = cd['filename'].strip("\"'")
if filename: return filename
# if no filename was found above, parse it out of the final URL.
return os.path.basename(urlparse.urlsplit(openUrl.url)[2])
req = urllib2.Request(url)
try:
r = urllib2.urlopen(req)
except urllib2.HTTPError, e:
print e.fp.read()
try:
fileName = fileName or getFileName(url,r)
with open(fileName, 'wb') as f:
shutil.copyfileobj(r,f)
finally:
r.close()
download('http://www.altria.com/Documents/Altria_10Q_Filed10242013.pdf#?page=24')
使用网址完全正常:http://www.gao.gov/new.items/d04641.pdf 所以我的问题是为什么它对某些网址不起作用,但与上面提到的网址完全一致。
答案 0 :(得分:0)
这是一个范围问题。
在功能开始时,定义:
r=None
然后,不要调用r.close(),而是执行以下操作:
if r:
r.close()
答案 1 :(得分:0)
发生的事情是第一个异常被捕获:except urllib2.HTTPError
但代码仍在继续,即使未定义r
(因为发生异常)
我认为您希望使用else
块中的try/except
子句仅在r = urllib2.urlopen(req)
成功后执行其余代码:
def download(url, fileName=None):
def getFileName(url,openUrl):
if 'Content-Disposition' in openUrl.info():
# If the response has Content-Disposition, try to get filename from it
cd = dict(map(lambda x: x.strip().split('=') if '=' in x else (x.strip(),''),openUrl.info()['Content-Disposition'].split(';')))
if 'filename' in cd:
filename = cd['filename'].strip("\"'")
if filename: return filename
# if no filename was found above, parse it out of the final URL.
return os.path.basename(urlparse.urlsplit(openUrl.url)[2])
req = urllib2.Request(url)
try:
r = urllib2.urlopen(req)
except urllib2.HTTPError, e:
print e.fp.read()
else:
try:
fileName = fileName or getFileName(url,r)
with open(fileName, 'wb') as f:
shutil.copyfileobj(r,f)
finally:
r.close()
答案 2 :(得分:-1)
我假设它打印出一条错误消息,说明urllib2.urlopen(req)如何在它为您提供未绑定的本地错误之前失败。如果是,请在raise
之后的行上添加print e.fp.read()
,您的问题会有所不同。