Question

我有一个数组，其中包含远程文件的URL地址。

默认情况下，我尝试使用这种糟糕的方法下载所有文件：

for a in ARRAY:
   wget.download(url=A, out=path_folder)

因此，它出于某种原因：主机服务器返回超时，某些URL被破坏等。

如何处理这个过程更专业？但我不能将此应用于我的案例。

Answer 1

如果您仍想使用wget，可以将下载包装在try..except块中，该块只打印任何异常并转到下一个文件：

for f in files:
    try:
        wget.download(url=f, out=path_folder)
    except Exception as e:
        print("Could not download file {}".format(f)
        print(e)

Answer 2

你可以使用urllib

import urllib.request
urllib.request.urlretrieve('http://www.example.com/files/file.ext', 'folder/file.ext')

您可以在反向网址周围使用try: except:来捕获任何错误

try:
    urllib.request.urlretrieve('http://www.example.com/files/file.ext', 'folder/file.ext')
except Exception as e:
    print('The server couldn\'t fulfill the request.')
    print('Error code: ', e.code)

Answer 3

将其添加为另一个答案，

如果要解决超时问题，可以使用请求库

import requests 
try:
   requests.get('http://url/to/file')
catch Exception as e:
   print('Error code: ', e.code)

如果您没有指定任何时间不会超时

Answer 4

这里有一种定义超时的方法，它从url读取文件名并以流的形式检索大文件，这样你的内存就不会被过量填充

import requests
import urlparse, os
timeout = 30  # Seconds
for url in urls:
    try:
        # Make the actual request, set the timeout for no data to X seconds and enable streaming responses so we don't have to keep the large files in memory
        request = requests.get(url, timeout=timeout, stream=True)
        # Get the Filename from the URL
        name = os.path.basename(urlparse.urlparse(url).path)
        # Open the output file and make sure we write in binary mode
        with open(name, 'wb') as fh:
            # Walk through the request response in chunks of 1024 * 1024 bytes, so 1MiB
            for chunk in request.iter_content(1024 * 1024):
                # Write the chunk to the file
                fh.write(chunk)
    except Exception as e:
        print("Something went wrong:", e)

如何处理在Python中下载文件？

4 个答案: