我想在python中使用多线程来减少使用urllib2.urlopen(url)加载网页的不同子页面的时间。我想要加载的页面通常需要4.5秒。所以我的计划是使用多线程并行加载同一页面的不同子页面,并减少整体加载时间。
我写了这个例子,令人惊讶的是,多线程需要的时间比"普通"版本一个接一个。
我做错了什么?
import urllib.request as urllib2
from multiprocessing.dummy import Pool as ThreadPool
import time
urls = [
'http://www.spiegel.de', 'http://www.spiegel.de', 'http://www.spiegel.de', 'http://www.spiegel.de', 'http://www.spiegel.de', 'http://www.spiegel.de', 'http://www.spiegel.de', 'http://www.spiegel.de', 'http://www.spiegel.de', 'http://www.spiegel.de', 'http://www.spiegel.de', 'http://www.spiegel.de', 'http://www.spiegel.de', 'http://www.spiegel.de'
]
start = time.time()
for url in urls:
results = urllib2.urlopen(url)
print(results)
end = time.time()
print("elapsed time no threading: ", end - start)
start = time.time()
# Make the Pool of workers
pool = ThreadPool(4)
# Open the urls in their own threads
# and return the results
results = pool.map(urllib2.urlopen, urls)
print(results)
#close the pool and wait for the work to finish
pool.close()
pool.join()
end = time.time()
print("elapsed time threading: ", end - start)
结果:
<http.client.HTTPResponse object at 0x02AF4BF0>
...
<http.client.HTTPResponse object at 0x02AF4C10>
elapsed time no threading: 0.9750549793243408
[<http.client.HTTPResponse object at 0x02B11330>, ... <http.client.HTTPResponse object at 0x02B3F8D0>]
elapsed time threading: 3.4091949462890625