下载给定其URL的文件并存储与内容处置相同的文件名

时间:2014-01-10 20:14:43

标签: python

我从How to download a file using python in a 'smarter' way?获得了此代码?

但它引发了一个错误:

   in download
   r.close()
   UnboundLocalError: local variable 'r' referenced before assignment

另外我想添加一个条件,即要下载的文件应该只是pdf。

import urllib2
import shutil
import urlparse
import os


def download(url, fileName=None):
    def getFileName(url,openUrl):
        if 'Content-Disposition' in openUrl.info():
            # If the response has Content-Disposition, try to get filename from it
            cd = dict(map(lambda x: x.strip().split('=') if '=' in x else (x.strip(),''),openUrl.info()['Content-Disposition'].split(';')))
            if 'filename' in cd:
                filename = cd['filename'].strip("\"'")
                if filename: return filename
         # if no filename was found above, parse it out of the final URL.
    return os.path.basename(urlparse.urlsplit(openUrl.url)[2])

    req = urllib2.Request(url)
    try:
        r = urllib2.urlopen(req)
    except urllib2.HTTPError, e:
            print e.fp.read()
    try:
            fileName = fileName or getFileName(url,r)
            with open(fileName, 'wb') as f:
                 shutil.copyfileobj(r,f)
    finally:
            r.close()

download('http://www.altria.com/Documents/Altria_10Q_Filed10242013.pdf#?page=24')

使用网址完全正常:http://www.gao.gov/new.items/d04641.pdf 所以我的问题是为什么它对某些网址不起作用,但与上面提到的网址完全一致。

3 个答案:

答案 0 :(得分:0)

这是一个范围问题。

在功能开始时,定义:

    r=None

然后,不要调用r.close(),而是执行以下操作:

    if r:
      r.close()

答案 1 :(得分:0)

发生的事情是第一个异常被捕获:except urllib2.HTTPError但代码仍在继续,即使未定义r(因为发生异常)

我认为您希望使用else块中的try/except子句仅在r = urllib2.urlopen(req)成功后执行其余代码:

def download(url, fileName=None):
    def getFileName(url,openUrl):
        if 'Content-Disposition' in openUrl.info():
            # If the response has Content-Disposition, try to get filename from it
            cd = dict(map(lambda x: x.strip().split('=') if '=' in x else (x.strip(),''),openUrl.info()['Content-Disposition'].split(';')))
            if 'filename' in cd:
                filename = cd['filename'].strip("\"'")
                if filename: return filename
        # if no filename was found above, parse it out of the final URL.
        return os.path.basename(urlparse.urlsplit(openUrl.url)[2])

    req = urllib2.Request(url)
    try:
        r = urllib2.urlopen(req)
    except urllib2.HTTPError, e:
        print e.fp.read()
    else:
        try:
            fileName = fileName or getFileName(url,r)
            with open(fileName, 'wb') as f:
                 shutil.copyfileobj(r,f)
        finally:
            r.close()

答案 2 :(得分:-1)

我假设它打印出一条错误消息,说明urllib2.urlopen(req)如何在它为您提供未绑定的本地错误之前失败。如果是,请在raise之后的行上添加print e.fp.read(),您的问题会有所不同。