如何处理在Python中下载文件?

时间:2018-02-19 15:35:28

标签: python python-3.x

我有一个数组,其中包含远程文件的URL地址。

默认情况下,我尝试使用这种糟糕的方法下载所有文件:

for a in ARRAY:
   wget.download(url=A, out=path_folder)

因此,它出于某种原因:主机服务器返回超时,某些URL被破坏等。

如何处理这个过程更专业?但我不能将此应用于我的案例。

4 个答案:

答案 0 :(得分:3)

如果您仍想使用wget,可以将下载包装在try..except块中,该块只打印任何异常并转到下一个文件:

for f in files:
    try:
        wget.download(url=f, out=path_folder)
    except Exception as e:
        print("Could not download file {}".format(f)
        print(e)

答案 1 :(得分:0)

你可以使用urllib

import urllib.request
urllib.request.urlretrieve('http://www.example.com/files/file.ext', 'folder/file.ext')

您可以在反向网址周围使用try: except:来捕获任何错误

try:
    urllib.request.urlretrieve('http://www.example.com/files/file.ext', 'folder/file.ext')
except Exception as e:
    print('The server couldn\'t fulfill the request.')
    print('Error code: ', e.code)

答案 2 :(得分:0)

将其添加为另一个答案,

如果要解决超时问题,可以使用请求库

import requests 
try:
   requests.get('http://url/to/file')
catch Exception as e:
   print('Error code: ', e.code)

如果您没有指定任何时间不会超时

答案 3 :(得分:0)

这里有一种定义超时的方法,它从url读取文件名并以流的形式检索大文件,这样你的内存就不会被过量填充

import requests
import urlparse, os
timeout = 30  # Seconds
for url in urls:
    try:
        # Make the actual request, set the timeout for no data to X seconds and enable streaming responses so we don't have to keep the large files in memory
        request = requests.get(url, timeout=timeout, stream=True)
        # Get the Filename from the URL
        name = os.path.basename(urlparse.urlparse(url).path)
        # Open the output file and make sure we write in binary mode
        with open(name, 'wb') as fh:
            # Walk through the request response in chunks of 1024 * 1024 bytes, so 1MiB
            for chunk in request.iter_content(1024 * 1024):
                # Write the chunk to the file
                fh.write(chunk)
    except Exception as e:
        print("Something went wrong:", e)