我正在使用beautifullSoup抓取特定站点,并将所有文件链接保存到队列中。 在另一个线程中,我正在使用urlretrieve下载所有这些链接:
try:
urlretrieve(link, file_path)
except (ContentTooShortError, URLError, ConnectionError, TimeoutError) as e:
self.logger.info(
"Download failed for the following link {}".format(
str(link)
)
)
self.logger.exception(e)
我从该站点获得了大约1700个链接,并且正在同时下载它们。对于某些链接,当我尝试下载它时遇到以下异常:
[2019-12-19 09:34:14,648] [ERROR] --- [download_web_records] [P2484][file_downloader_service-1 - 13832]- <urlopen error [WinError 10060] A connect
ion attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected
host has failed to respond>
Traceback (most recent call last):
File "C:\Users\JeyJey\AppData\Local\Programs\Python\Python37-32\lib\urllib\request.py", line 1317, in do_open
encode_chunked=req.has_header('Transfer-encoding'))
File "C:\Users\JeyJey\AppData\Local\Programs\Python\Python37-32\lib\http\client.py", line 1244, in request
self._send_request(method, url, body, headers, encode_chunked)
File "C:\Users\JeyJey\AppData\Local\Programs\Python\Python37-32\lib\http\client.py", line 1290, in _send_request
self.endheaders(body, encode_chunked=encode_chunked)
File "C:\Users\JeyJey\AppData\Local\Programs\Python\Python37-32\lib\http\client.py", line 1239, in endheaders
self._send_output(message_body, encode_chunked=encode_chunked)
File "C:\Users\JeyJey\AppData\Local\Programs\Python\Python37-32\lib\http\client.py", line 1026, in _send_output
self.send(msg)
File "C:\Users\JeyJey\AppData\Local\Programs\Python\Python37-32\lib\http\client.py", line 966, in send
self.connect()
File "C:\Users\JeyJey\AppData\Local\Programs\Python\Python37-32\lib\http\client.py", line 938, in connect
(self.host,self.port), self.timeout, self.source_address)
File "C:\Users\JeyJey\AppData\Local\Programs\Python\Python37-32\lib\socket.py", line 727, in create_connection
raise err
File "C:\Users\JeyJey\AppData\Local\Programs\Python\Python37-32\lib\socket.py", line 716, in create_connection
sock.connect(sa)
TimeoutError: [WinError 10060] A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "C:\Users\JeyJey\Desktop\myproject\myproject\Code\WebScraping\file_downloader_service.py", line 98, in download_web_records
self._downloadFile(web_record.download_link, file_full_path_name)
File "C:\Users\JeyJey\Desktop\myproject\myproject\Code\WebScraping\file_downloader_service.py", line 203, in _downloadFile
:
File "C:\Users\JeyJey\Desktop\myproject\myproject\Code\WebScraping\file_downloader_service.py", line 203, in _downloadFile
urlretrieve(link, file_path)
File "C:\Users\JeyJey\AppData\Local\Programs\Python\Python37-32\lib\urllib\request.py", line 247, in urlretrieve
with contextlib.closing(urlopen(url, data)) as fp:
File "C:\Users\JeyJey\AppData\Local\Programs\Python\Python37-32\lib\urllib\request.py", line 222, in urlopen
return opener.open(url, data, timeout)
File "C:\Users\JeyJey\AppData\Local\Programs\Python\Python37-32\lib\urllib\request.py", line 525, in open
response = self._open(req, data)
File "C:\Users\JeyJey\AppData\Local\Programs\Python\Python37-32\lib\urllib\request.py", line 543, in _open
'_open', req)
File "C:\Users\JeyJey\AppData\Local\Programs\Python\Python37-32\lib\urllib\request.py", line 503, in _call_chain
result = func(*args)
File "C:\Users\JeyJey\AppData\Local\Programs\Python\Python37-32\lib\urllib\request.py", line 1345, in http_open
return self.do_open(http.client.HTTPConnection, req)
File "C:\Users\JeyJey\AppData\Local\Programs\Python\Python37-32\lib\urllib\request.py", line 1319, in do_open
raise URLError(err)
urllib.error.URLError: <urlopen error [WinError 10060] A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond>
我正在Windows 10上运行此代码,因此我认为也许Windows s firewall causing it. Another cause might be the site
的防火墙可能阻止了我,但它很奇怪,因为它并非一直都在发生。对于某些链接,它确实发生了我尝试下载,链接是随机的。
有人熟悉吗?