Question

我从How to download a file using python in a 'smarter' way?获得了此代码？

但它引发了一个错误：

   in download
   r.close()
   UnboundLocalError: local variable 'r' referenced before assignment

另外我想添加一个条件，即要下载的文件应该只是pdf。

import urllib2
import shutil
import urlparse
import os


def download(url, fileName=None):
    def getFileName(url,openUrl):
        if 'Content-Disposition' in openUrl.info():
            # If the response has Content-Disposition, try to get filename from it
            cd = dict(map(lambda x: x.strip().split('=') if '=' in x else (x.strip(),''),openUrl.info()['Content-Disposition'].split(';')))
            if 'filename' in cd:
                filename = cd['filename'].strip("\"'")
                if filename: return filename
         # if no filename was found above, parse it out of the final URL.
    return os.path.basename(urlparse.urlsplit(openUrl.url)[2])

    req = urllib2.Request(url)
    try:
        r = urllib2.urlopen(req)
    except urllib2.HTTPError, e:
            print e.fp.read()
    try:
            fileName = fileName or getFileName(url,r)
            with open(fileName, 'wb') as f:
                 shutil.copyfileobj(r,f)
    finally:
            r.close()

download('http://www.altria.com/Documents/Altria_10Q_Filed10242013.pdf#?page=24')

使用网址完全正常：http://www.gao.gov/new.items/d04641.pdf 所以我的问题是为什么它对某些网址不起作用，但与上面提到的网址完全一致。

Answer 1

这是一个范围问题。

在功能开始时，定义：

    r=None

然后，不要调用r.close（），而是执行以下操作：

    if r:
      r.close()

Answer 2

发生的事情是第一个异常被捕获：except urllib2.HTTPError但代码仍在继续，即使未定义r（因为发生异常）

我认为您希望使用else块中的try/except子句仅在r = urllib2.urlopen(req)成功后执行其余代码：

def download(url, fileName=None):
    def getFileName(url,openUrl):
        if 'Content-Disposition' in openUrl.info():
            # If the response has Content-Disposition, try to get filename from it
            cd = dict(map(lambda x: x.strip().split('=') if '=' in x else (x.strip(),''),openUrl.info()['Content-Disposition'].split(';')))
            if 'filename' in cd:
                filename = cd['filename'].strip("\"'")
                if filename: return filename
        # if no filename was found above, parse it out of the final URL.
        return os.path.basename(urlparse.urlsplit(openUrl.url)[2])

    req = urllib2.Request(url)
    try:
        r = urllib2.urlopen(req)
    except urllib2.HTTPError, e:
        print e.fp.read()
    else:
        try:
            fileName = fileName or getFileName(url,r)
            with open(fileName, 'wb') as f:
                 shutil.copyfileobj(r,f)
        finally:
            r.close()

Answer 3

我假设它打印出一条错误消息，说明urllib2.urlopen（req）如何在它为您提供未绑定的本地错误之前失败。如果是，请在raise之后的行上添加print e.fp.read()，您的问题会有所不同。

下载给定其URL的文件并存储与内容处置相同的文件名

3 个答案: