Python,基本问题:如何使用urllib.request.urlretrieve下载多个URL

时间:2011-08-10 19:01:17

标签: python

我有以下功能齐全的工作代码:

import urllib.request
import zipfile

url = "http://url.com/archive.zip?key=7UCxcuCzFpYeu7tz18JgGZFAAgXQ2sop"
filename = "C:/test/archive.zip"
destinationPath = "C:/test"

urllib.request.urlretrieve(url,filename)
sourceZip = zipfile.ZipFile(filename, 'r')

for name in sourceZip.namelist():
    sourceZip.extract(name, destinationPath)
sourceZip.close()

它可以完美地工作几次,但由于我从中检索文件的服务器有一些限制,一旦达到每日限制,我就会收到此错误:

Traceback (most recent call last):
  File "script.py", line 11, in <module>
    urllib.request.urlretrieve(url,filename)
  File "C:\Python32\lib\urllib\request.py", line 150, in urlretrieve
    return _urlopener.retrieve(url, filename, reporthook, data)
  File "C:\Python32\lib\urllib\request.py", line 1591, in retrieve
    block = fp.read(bs)
ValueError: read of closed file

如何更改脚本,以便它包含多个url的列表,而不是一个单独的url,并且脚本一直尝试从列表中下载,直到成功,然后继续解压缩。我只需要一次成功的下载。

对于非常陌生的Python感到抱歉,但我无法想象这一点。我假设我必须改变变量看起来像这样:

url = {
"http://url.com/archive.zip?key=7UCxcuCzFpYeu7tz18JgGZFAAgXQ2soe",
"http://url.com/archive.zip?key=7UCxcuCzFpYeu7tz18JgGZFAAgXQ2sod",
"http://url.com/archive.zip?key=7UCxcuCzFpYeu7tz18JgGZFAAgXQ2soc",
"http://url.com/archive.zip?key=7UCxcuCzFpYeu7tz18JgGZFAAgXQ2sob",
"http://url.com/archive.zip?key=7UCxcuCzFpYeu7tz18JgGZFAAgXQ2soa",
}

然后将此行更改为某种循环:

urllib.request.urlretrieve(url,filename)

4 个答案:

答案 0 :(得分:3)

您希望将您的网址放在列表中,然后循环浏览该列表并尝试每个网址。你捕获但忽略它们抛出的异常,一旦成功就打破循环。试试这个:

import urllib.request
import zipfile

urls = ["http://url.com/archive.zip?key=7UCxcuCzFpYeu7tz18JgGZFAAgXQ2sop", "other url", "another url"]
filename = "C:/test/test.zip"
destinationPath = "C:/test"

for url in urls:
    try:
        urllib.request.urlretrieve(url,filename)
        sourceZip = zipfile.ZipFile(filename, 'r')
        break
    except ValueError:
        pass

for name in sourceZip.namelist():
    sourceZip.extract(name, destinationPath)
sourceZip.close()

答案 1 :(得分:2)

import urllib.request
import zipfile

urllist = ("http://url.com/archive.zip?key=7UCxcuCzFpYeu7tz18JgGZFAAgXQ2sop",
            "another",
            "yet another",
            "etc")

filename = "C:/test/test.zip"
destinationPath = "C:/test"

for url in urllist:
    try:
        urllib.request.urlretrieve(url,filename)
    except ValueError:
        continue
    sourceZip = zipfile.ZipFile(filename, 'r')

    for name in sourceZip.namelist():
        sourceZip.extract(name, destinationPath)
    sourceZip.close()
    break

假设你只想每次尝试一次,直到一个工作,然后停止,这将有效。

答案 2 :(得分:0)

对于完整的分布式任务,您可以结帐Celery及其重试机制Celery-retry

或者您可以查看Retry-decorator, 例如:

import time

# Retry decorator with exponential backoff
def retry(tries, delay=3, backoff=2):
  """Retries a function or method until it returns True.

  delay sets the initial delay, and backoff sets how much the delay should
  lengthen after each failure. backoff must be greater than 1, or else it
  isn't really a backoff. tries must be at least 0, and delay greater than
  0."""

  if backoff <= 1:
    raise ValueError("backoff must be greater than 1")

  tries = math.floor(tries)
  if tries < 0:
    raise ValueError("tries must be 0 or greater")

  if delay <= 0:
    raise ValueError("delay must be greater than 0")

  def deco_retry(f):
    def f_retry(*args, **kwargs):
      mtries, mdelay = tries, delay # make mutable

      rv = f(*args, **kwargs) # first attempt
      while mtries > 0:
        if rv == True: # Done on success
          return True

        mtries -= 1      # consume an attempt
        time.sleep(mdelay) # wait...
        mdelay *= backoff  # make future wait longer

        rv = f(*args, **kwargs) # Try again

      return False # Ran out of tries :-(

    return f_retry # true decorator -> decorated function
  return deco_retry  # @retry(arg[, ...]) -> true decorator

答案 3 :(得分:0)

urls = [
"http://url.com/archive.zip?key=7UCxcuCzFpYeu7tz18JgGZFAAgXQ2soe",
"http://url.com/archive.zip?key=7UCxcuCzFpYeu7tz18JgGZFAAgXQ2sod",
"http://url.com/archive.zip?key=7UCxcuCzFpYeu7tz18JgGZFAAgXQ2soc",
"http://url.com/archive.zip?key=7UCxcuCzFpYeu7tz18JgGZFAAgXQ2sob",
"http://url.com/archive.zip?key=7UCxcuCzFpYeu7tz18JgGZFAAgXQ2soa",
]

for u in urls:
   urllib.request.urlretrieve(u,filename)
   ... rest of code ...