Python iter_content和多处理错误

时间:2018-08-30 16:05:03

标签: python-requests python-multiprocessing

函数调用:

 downloaded_files = pool.starmap(utils.retrieve_demofiles, list(zip(urls, self.demo_match_urls, fullfilenames)))

错误来源:

def retrieve_demofiles(url, filename, fullfilename, headers = {}):
 print('\n Now downloading: ', filename)
 user_agent_addition = {'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/36.0.1941.0 Safari/537.36'}
 headers = dict(headers.items() | user_agent_addition.items())
 succesfull_file = False

 for i in range(constants.download_rety_count):# check file availiblity
    response = requests.head(url, headers=headers, allow_redirects=True)
    print('Type ', response.headers['content-type'])
    contentType = response.headers['content-type']
    if contentType in constants.valid_compressed_mime_types:
        succesfull_file = True
        break
    else:
        time.sleep(1)

 if not succesfull_file:
    #print("ERROR: Not expected file: {}".format(filename))
    logger.exception("ERROR: Not expected file: {}".format(filename))
    return (None, None)

 try:
    r = requests.get(url, stream=True, headers=headers, timeout = 10)
    r.raise_for_status()
 except requests.exceptions.HTTPError as errh:
    logger.exception("Http Error: ",errh)
 except requests.exceptions.ConnectionError as errc:
    logger.exception("Error Connecting: ",errc)
 except requests.exceptions.Timeout as errt:
    logger.exception("Timeout Error: ",errt)
 except requests.exceptions.RequestException as err:
    logger.exception("Error: ",err)
 finally:
    #print('Type ', r.headers['content-type'], 'Name ', filename)
    # Total size in bytes.
    total_size = int(r.headers.get('content-length', 0));
    block_size = 1024
    wrote = 0
    logger.error('Total size: {}'.format(math.ceil(total_size//block_size)))
    with open(fullfilename, 'wb') as f:
        # for data in tqdm.tqdm(r.iter_content(block_size), total=int(total_size//block_size) , unit='KB', unit_scale=True, leave=True):
        for block in r.iter_content(block_size): #source of error -> begin
            try:
                wrote = wrote  + len(block)
                f.write(block)
                sys.stdout.flush()
            except Exception as e:
                logger.error(e) #source of error-> end
    # # if total_size != 0 and wrote != total_size:
    #     print("ERROR: Failed to download: {}".format(filename))
    #     logger.exception("ERROR: Failed to download: {}".format(filename))
    #     return (None, None)
    # else:
    #     print("Succesfully downloaded: {}".format(filename))
    #     logger.info("Succesfully downloaded: {}".format(filename))
    #     return (fullfilename, filename)
    print('returned: ', os.getpid())
    r.close()

我收到的错误是: RuntimeError:无法加入当前线程

除此错误外,该方法已成功将下载的文件写入磁盘,没有任何错误。程序完成时将出现此错误。 使用 iter_content 读取流时,错误源是for循环。当我删除此for循环但放入pass语句时,它可以正常工作(当然,它什么也不做)。我不明白为什么 iter_content 会导致此类错误。

感谢您的贡献。

0 个答案:

没有答案