Question

for index in range(1,10):
    send_headers = {
                    'User-Agent':'Mozilla/5.0 (Windows NT 6.2;rv:16.0) Gecko/20100101 Firefox/16.0',
                    'Accept':'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
                    'Connection':'keep-alive'
    }

    try:
        req=urllib2.Request(url,headers=send_headers)
        response=urllib2.urlopen(req)
        sleeptime=random.randint(1,30*index)
        time.sleep(sleeptime)
    except Exception, e:
        print e
        traceback.print_exc()
        sleeptime=random.randint(13,40*index)
        print url
        time.sleep(sleeptime)
        continue
    if response.getcode() != 200:
        continue
    else:
        break
return response.read()

我发现我的代码有时会在返回response.read()时休眠，但程序没有死，并且没有错误或异常，我不知道为什么以及如何发生。我该如何解决？

这是python，我想在网上获得一些图片。

Answer 1

我认为它可能因连接超时而睡眠。

urllib.urlopen可以通过timeout参数设置超时。（python3）

如果未设置，则将使用套接字默认超时。

和默认套接字超时为-1.0，没有设置，没有超时。

所以试试这个，

response=urllib2.urlopen(req, timeout=3)

或者，在python2中

import socket
setdefaulttimeout(3.0)

无论如何，使用requests代替urllib2

Answer 2

https://server.com/从服务器读取HTTP响应。这可能需要一段时间，因为读取涉及等待字节通过网络到达。

从网上获取资源需要时间，没有办法解决这个问题。

也就是说，您可以以非阻塞方式访问网络，并在数据可用时收到通知。这不会改变获取资源需要时间的事实。

我的代码有时会在response.read（）函数上休眠

2 个答案: