将多重处理与urllib.request一起使用时可能出现MaybeEncodingError

时间:2018-10-28 20:43:12

标签: python parallel-processing multiprocessing

我编写了一些代码来跟踪使用urllib.request进行和不进行多处理的情况下打开某些URL的时间:

import urllib.request
from multiprocessing import Pool
from bs4 import BeautifulSoup
import time

FANCY = 1

urls = [
    'http://www.python.org', 
    'http://www.python.org/about/',
    'http://www.onlamp.com/pub/a/python/2003/04/17/metaclasses.html',
    'http://www.python.org/doc/',
    'http://www.python.org/download/',
    'http://www.python.org/getit/',
    'http://www.python.org/community/',
    'https://wiki.python.org/moin/',
    'http://planet.python.org/',
    'https://wiki.python.org/moin/LocalUserGroups',
    'http://www.python.org/psf/',
    'http://docs.python.org/devguide/',
    'http://www.python.org/community/awards/'   
]

if __name__ == '__main__':

    start_time = time.time()
    if FANCY:
        pool = Pool() 
        # Abre URLs em seus próprios processos e retorna os resultados,
        results = pool.map(urllib.request.urlopen, urls)
        pool.close() 
        pool.join()
    else:
        results = list(map(urllib.request.urlopen, urls))
    # soup = BeautifulSoup(results[0].read(), 'html.parser')

    print(results)
    print(f"Execution time: {time.time() - start_time}")

如果我在multiprocessing.dummy中正常使用线程,则不会出现任何错误,但是运行不同的进程,则会出现以下错误:

multiprocessing.pool.MaybeEncodingError: Error sending result: '[<http.client.HTTPResponse object at 0x0400AB50>]'. Reason: 'TypeError("cannot serialize '_io.BufferedReader' object")'

那么,处理HTTPResponse对象时,多处理有什么问题?

0 个答案:

没有答案