Question

我有以下代码，使用urlretrieve工作来抓取图像.....太重要了。

def Opt3():
    global conn
    curs = conn.cursor()
    results = curs.execute("SELECT stock_code FROM COMPANY")

    for row in results:
    #for image_name in list_of_image_names:
        page = requests.get('url?prodid=' +     row[0])
        tree = html.fromstring(page.text)

        pic = tree.xpath('//*[@id="bigImg0"]')

        #print pic[0].attrib['src']
        print 'URL'+pic[0].attrib['src']
        try:
            urllib.urlretrieve('URL'+pic[0].attrib['src'],'images\\'+row[0]+'.jpg')
        except:
            pass

我正在阅读CSV以输入图像名称。除非遇到错误/损坏的网址（我认为没有图像），它才有效。我想知道我是否可以简单地跳过任何损坏的网址并获取代码继续抓取图像？感谢

Answer 1

urllib对错误捕获的支持非常糟糕。 urllib2是一个更好的选择。 urllib2中的urlretrieve等价物是：

resp = urllib2.urlopen(im_url)
with open(sav_name, 'wb') as f:
  f.write(resp.read())

要抓住的错误是：

urllib2.URLError, urllib2.HTTPError, httplib.HTTPException

如果网络中断，您还可以捕获socket.error。简单地使用except Exception是一个非常愚蠢的想法。即使你的拼写错误，它也会捕获上述块中的每个错误。

Answer 2

如果失败，只需使用try/except和continue

try:
    page = requests.get('url?prodid=' +     row[0])
except Exception,e:
    print e
    continue # continue to next row

Answer 3

而不是传递为什么不在发生错误时尝试继续。

try:
    urllib.urlretrieve('URL'+pic[0].attrib['src'],'images\\'+row[0]+'.jpg')

except Exception e:
    continue

URLRetrieve错误处理

3 个答案: