我有以下代码:
def whatever(url, data=None):
req = urllib2.Request(url)
res = urllib2.urlopen(req, data)
html = res.read()
res.close()
我尝试将它用于GET:
for i in range(1,20):
whatever(someurl)
然后,在前6次表现正确后,它会阻塞5秒,并继续正常休息GET:
2012-06-29 15:20:22,487: Clear [127.0.0.1:49967]:
2012-06-29 15:20:22,507: Clear [127.0.0.1:49967]:
2012-06-29 15:20:22,528: Clear [127.0.0.1:49967]:
2012-06-29 15:20:22,552: Clear [127.0.0.1:49967]:
2012-06-29 15:20:22,569: Clear [127.0.0.1:49967]:
2012-06-29 15:20:22,592: Clear [127.0.0.1:49967]:
**2012-06-29 15:20:26,486: Clear [127.0.0.1:49967]:**
2012-06-29 15:20:26,515: Clear [127.0.0.1:49967]:
2012-06-29 15:20:26,555: Clear [127.0.0.1:49967]:
2012-06-29 15:20:26,586: Clear [127.0.0.1:49967]:
2012-06-29 15:20:26,608: Clear [127.0.0.1:49967]:
2012-06-29 15:20:26,638: Clear [127.0.0.1:49967]:
2012-06-29 15:20:26,655: Clear [127.0.0.1:49967]:
2012-06-29 15:20:26,680: Clear [127.0.0.1:49967]:
2012-06-29 15:20:26,700: Clear [127.0.0.1:49967]:
2012-06-29 15:20:26,717: Clear [127.0.0.1:49967]:
2012-06-29 15:20:26,753: Clear [127.0.0.1:49967]:
2012-06-29 15:20:26,770: Clear [127.0.0.1:49967]:
2012-06-29 15:20:26,789: Clear [127.0.0.1:49967]:
2012-06-29 15:20:26,809: Clear [127.0.0.1:49967]:
2012-06-29 15:20:26,828: Clear [127.0.0.1:49967]:
如果使用POST(with data={'a':'b'})
,则每个请求都会被卡住2秒。我尝试urllib2
和pycurl
,他们都给出相同的结果。
有人对这种疲惫的行为有任何想法吗?
答案 0 :(得分:1)
提高性能的另一种方法是使用线程:
import threading, urllib2
import Queue
def read_url(url, queue):
data = urllib2.urlopen(url).read()
print('Fetched %s from %s' % (len(data), url))
queue.put(data)
def fetch_parallel():
result = Queue.Queue()
threads = [threading.Thread(target=read_url, args = (url,result)) for url in urls_to_load]
for t in threads:
t.start()
for t in threads:
t.join()
return result
def fetch_sequencial():
result = Queue.Queue()
for i in xrange(1, 20):
read_url("http://www.stackoverflow.com", result)
return result
给我[0.2秒完成]。
P.S。如果您不需要列表,请使用xrange
代替range
。 Explanation
答案 1 :(得分:0)