Question

当我尝试下载Wikipedia数据转储时，我不断收到此错误。是否因为我提出太多的下载文件请求？我正在使用100线程。

关于代码1：

def multithread_download_files_func(self,download_file):
    filename = download_file[download_file.rfind("/")+1:]
    save_file_w_submission_path = self.ptsf + filename
    if not os.path.exists(save_file_w_submission_path):
        opener = urllib.request.build_opener()
        opener.addheaders = [('User-agent', 'Mozilla/5.0')]
        urllib.request.install_opener(opener)
        response = urllib.request.urlopen(download_file)
        data_content = response.read()                 
    with open(save_file_w_submission_path, 'wb') as wf:    
        wf.write(data_content)
    return filename

甚至在代码2上：

    request = urllib.request.Request(download_file)
    response = urllib.request.urlopen(request)
    data_content = response.read()

线程

p = ThreadPool(100)
results = p.map(self.multithread_download_files_func, matching_fnmatch_list)
for r in results:
    print(r)

一致错误：

  File "D:\Users\Jonathan\Anaconda3\lib\urllib\request.py", line 650, in http_error_default
    raise HTTPError(req.full_url, code, msg, hdrs, fp)

HTTPError: Service Temporarily Unavailable

网址

https://dumps.wikimedia.org/other/pagecounts-raw/

Answer 1

我不知道其他人是否有更好的解决方案，但是我找到了一个代码，并根据需要进行了调整。它将一直循环到链接，直到得到结果为止。

if not os.path.exists(save_file_w_submission_path):
    data_content = None
    try:
        request = urllib.request.Request(download_file)
        response = urllib.request.urlopen(request)
        data_content = response.read()     
    except urllib.error.HTTPError:
        retries = 1
        success = False
        while not success:
            try:
                response = urllib.request.urlopen(download_file)
                success = True
            except Exception:
                wait = retries * 30;
                print('Error! Waiting %s secs and re-trying...' % wait + '\n')
                sys.stdout.flush()
                time.sleep(wait)
                retries += 1
    if data_content:
        with open(save_file_w_submission_path, 'wb') as wf:    
            wf.write(data_content)
        print(filename)

HTTPError：服务暂时不可用（Wikipedia数据转储的多线程下载）

1 个答案: