我编写了一些代码来跟踪使用urllib.request
进行和不进行多处理的情况下打开某些URL的时间:
import urllib.request
from multiprocessing import Pool
from bs4 import BeautifulSoup
import time
FANCY = 1
urls = [
'http://www.python.org',
'http://www.python.org/about/',
'http://www.onlamp.com/pub/a/python/2003/04/17/metaclasses.html',
'http://www.python.org/doc/',
'http://www.python.org/download/',
'http://www.python.org/getit/',
'http://www.python.org/community/',
'https://wiki.python.org/moin/',
'http://planet.python.org/',
'https://wiki.python.org/moin/LocalUserGroups',
'http://www.python.org/psf/',
'http://docs.python.org/devguide/',
'http://www.python.org/community/awards/'
]
if __name__ == '__main__':
start_time = time.time()
if FANCY:
pool = Pool()
# Abre URLs em seus próprios processos e retorna os resultados,
results = pool.map(urllib.request.urlopen, urls)
pool.close()
pool.join()
else:
results = list(map(urllib.request.urlopen, urls))
# soup = BeautifulSoup(results[0].read(), 'html.parser')
print(results)
print(f"Execution time: {time.time() - start_time}")
如果我在multiprocessing.dummy中正常使用线程,则不会出现任何错误,但是运行不同的进程,则会出现以下错误:
multiprocessing.pool.MaybeEncodingError: Error sending result: '[<http.client.HTTPResponse object at 0x0400AB50>]'. Reason: 'TypeError("cannot serialize '_io.BufferedReader' object")'
那么,处理HTTPResponse对象时,多处理有什么问题?