我有一个数组,其中包含远程文件的URL地址。
默认情况下,我尝试使用这种糟糕的方法下载所有文件:
for a in ARRAY:
wget.download(url=A, out=path_folder)
因此,它出于某种原因:主机服务器返回超时,某些URL被破坏等。
如何处理这个过程更专业?但我不能将此应用于我的案例。
答案 0 :(得分:3)
如果您仍想使用wget
,可以将下载包装在try..except
块中,该块只打印任何异常并转到下一个文件:
for f in files:
try:
wget.download(url=f, out=path_folder)
except Exception as e:
print("Could not download file {}".format(f)
print(e)
答案 1 :(得分:0)
你可以使用urllib
import urllib.request
urllib.request.urlretrieve('http://www.example.com/files/file.ext', 'folder/file.ext')
您可以在反向网址周围使用try: except:
来捕获任何错误
try:
urllib.request.urlretrieve('http://www.example.com/files/file.ext', 'folder/file.ext')
except Exception as e:
print('The server couldn\'t fulfill the request.')
print('Error code: ', e.code)
答案 2 :(得分:0)
将其添加为另一个答案,
如果要解决超时问题,可以使用请求库
import requests
try:
requests.get('http://url/to/file')
catch Exception as e:
print('Error code: ', e.code)
如果您没有指定任何时间不会超时
答案 3 :(得分:0)
这里有一种定义超时的方法,它从url读取文件名并以流的形式检索大文件,这样你的内存就不会被过量填充
import requests
import urlparse, os
timeout = 30 # Seconds
for url in urls:
try:
# Make the actual request, set the timeout for no data to X seconds and enable streaming responses so we don't have to keep the large files in memory
request = requests.get(url, timeout=timeout, stream=True)
# Get the Filename from the URL
name = os.path.basename(urlparse.urlparse(url).path)
# Open the output file and make sure we write in binary mode
with open(name, 'wb') as fh:
# Walk through the request response in chunks of 1024 * 1024 bytes, so 1MiB
for chunk in request.iter_content(1024 * 1024):
# Write the chunk to the file
fh.write(chunk)
except Exception as e:
print("Something went wrong:", e)