我正在使用多线程构建代理检查程序,特别是来自:
的线程池 pkg_info | less
pkg_info apache
。
http请求是使用urllib2。
我想要做的是每个代理运行20个请求。如果是1线程,则需要花费太多时间。那就是多线程的力量来帮助。所以一旦我设置代理,我想运行这20个请求,并管理2件事。一种是计算异常,如果发生太多则转储代理。第二是保存平均响应时间并在以后显示。
我只是没有设法实现上述内容。但我用1个线程实现了它:
from multiprocessing.dummy import Pool as ThreadPool
必须做的事情是:
每个代理的:在更改代理之前等待所有20个请求都结束。在线程累加时以某种方式同步线程以计算平均响应时间(包括不考虑异常)
我到目前为止所阅读的最佳解决方案是使用import socket
import ssl
import time
import urllib
import urllib2
import httplib
proxyList = []
def loadProxysFromFile(fileName):
global proxyList
with open(fileName) as f:
proxyList = [line.rstrip('\n') for line in f]
def setUrllib2Proxy(proxyAddress):
proxy = urllib2.ProxyHandler({
'http': "http://" + proxyAddress,
'https': "https://" + proxyAddress
})
opener = urllib2.build_opener(proxy)
urllib2.install_opener(opener)
def timingRequest(proxy, url):
error = False
setUrllib2Proxy(proxy)
start = time.time()
try:
req = urllib2.Request(url)
urllib2.urlopen(req, timeout=5) #opening the request (getting a response)
except (urllib2.URLError, httplib.BadStatusLine, ssl.SSLError, socket.error) as e:
error = True
end = time.time()
timing = end - start
if error:
print "Error with proxy " + proxy
return 0
else:
print proxy + " Request to " + url + " took: %s" %timing + " seconds."
return timing
# Main
loadProxysFromFile("proxyList.txt")
for proxy in proxyList:
print "Testing: " + proxy
print "\n"
REQUEST_NUM = 20
ERROR_TOLERANCE_NUM = 3
resultList = []
for proxy in proxyList:
avgTime = 0
errorCount = 0
for x in range(0, REQUEST_NUM):
result = timingRequest(proxy, 'https://www.google.com')
if (result == 0):
errorCount += 1
if (errorCount >= ERROR_TOLERANCE_NUM):
break
else:
avgTime += result
if (errorCount < ERROR_TOLERANCE_NUM):
avgTime = avgTime/(REQUEST_NUM-errorCount)
resultList.append(proxy + " has an average response time of: %s" %avgTime)
print '\n'
print "Results Summery: "
print "-----------------"
for res in resultList:
print res
和from multiprocessing.dummy import Pool as ThreadPool
,但我无法弄清楚如何在我的代码中实现它。