当我尝试下载Wikipedia数据转储时,我不断收到此错误。是否因为我提出太多的下载文件请求?我正在使用100线程。
关于代码1:
def multithread_download_files_func(self,download_file):
filename = download_file[download_file.rfind("/")+1:]
save_file_w_submission_path = self.ptsf + filename
if not os.path.exists(save_file_w_submission_path):
opener = urllib.request.build_opener()
opener.addheaders = [('User-agent', 'Mozilla/5.0')]
urllib.request.install_opener(opener)
response = urllib.request.urlopen(download_file)
data_content = response.read()
with open(save_file_w_submission_path, 'wb') as wf:
wf.write(data_content)
return filename
甚至在代码2上:
request = urllib.request.Request(download_file)
response = urllib.request.urlopen(request)
data_content = response.read()
线程
p = ThreadPool(100)
results = p.map(self.multithread_download_files_func, matching_fnmatch_list)
for r in results:
print(r)
一致错误:
File "D:\Users\Jonathan\Anaconda3\lib\urllib\request.py", line 650, in http_error_default
raise HTTPError(req.full_url, code, msg, hdrs, fp)
HTTPError: Service Temporarily Unavailable
网址
https://dumps.wikimedia.org/other/pagecounts-raw/
答案 0 :(得分:0)
我不知道其他人是否有更好的解决方案,但是我找到了一个代码,并根据需要进行了调整。它将一直循环到链接,直到得到结果为止。
if not os.path.exists(save_file_w_submission_path):
data_content = None
try:
request = urllib.request.Request(download_file)
response = urllib.request.urlopen(request)
data_content = response.read()
except urllib.error.HTTPError:
retries = 1
success = False
while not success:
try:
response = urllib.request.urlopen(download_file)
success = True
except Exception:
wait = retries * 30;
print('Error! Waiting %s secs and re-trying...' % wait + '\n')
sys.stdout.flush()
time.sleep(wait)
retries += 1
if data_content:
with open(save_file_w_submission_path, 'wb') as wf:
wf.write(data_content)
print(filename)