Question

在允许我的程序进入下一个语句之前，如何检查urllib.urlretrieve(url, file_name)是否已完成？

以下面的代码片段为例：

import traceback
import sys
import Image
from urllib import urlretrieve

try:
        print "Downloading gif....."
        urlretrieve(imgUrl, "tides.gif")
        # Allow time for image to download/save:
        time.sleep(5)
        print "Gif Downloaded."
    except:
        print "Failed to Download new GIF"
        raw_input('Press Enter to exit...')
        sys.exit()

    try:
        print "Converting GIF to JPG...."
        Image.open("tides.gif").convert('RGB').save("tides.jpg")
        print "Image Converted"
    except Exception, e:
        print "Conversion FAIL:", sys.exc_info()[0]
        traceback.print_exc()
        pass

当通过urlretrieve(imgUrl, "tides.gif")下载'tides.gif'的时间超过time.sleep(seconds)时会导致文件为空或不完整，Image.open("tides.gif")会引发IOError（到期）到tides.gif文件，大小为0 kB）。

如何检查urlretrieve(imgUrl, "tides.gif")的状态，只有在声明成功完成后才允许我的程序前进？

Answer 1

请求比urllib更好但你应该能够同步下载文件：

import urllib
f = urllib.urlopen(imgUrl)
with open("tides.gif", "wb") as imgFile:
    imgFile.write(f.read())
# you won't get to this print until you've downloaded
# all of the image at imgUrl or an exception is raised
print "Got it!"

这样做的缺点是它需要将整个文件缓冲在内存中，所以如果你一次下载大量图像，最终可能会使用大量的内存。这不太可能，但仍值得了解。

Answer 2

我会使用来自http://docs.python-requests.org/en/latest/index.html的python请求而不是普通的urllib2。默认情况下，请求是同步的，因此如果不先获取图像，它将不会进入下一行代码。

Answer 3

我在此处发现了类似的问题： Why is "raise IOError("cannot identify image file")"showing up only part of the time?

更具体地说，看一下问题的答案。用户指向其他几个线程，这些线程准确地解释了如何以多种方式解决问题。您可能感兴趣的第一个包括进度条显示。

Answer 4

所选答案不适用于大文件。这是正确的解决方案：

import sys
import time
import urllib


def reporthook(count, block_size, total_size):
    if int(count * block_size * 100 / total_size) == 100:
        print 'Download completed!'

def save(url, filename):
    urllib.urlretrieve(url, filename, reporthook)

Answer 5

你可以尝试以下方法：

show tables like '%_history

检查`urllib.urlretrieve（url，file_name）`完成状态

5 个答案: