如何获取urllib库从一个http目录下载所有扩展名为.gz的文件

时间:2019-06-29 23:05:44

标签: python-3.x http directory download urllib

我试图让urllib下载目录中所有以.gz结尾的文件。我的代码运行没有错误,但未下载任何内容。我不确定自己在做什么错。请帮助我。

from urllib import *
directory = 'https://eogdata.mines.edu/wwwdata/viirs_products/vnf/v30'
with request.urlopen(directory) as doc:
        for line in doc:  
            if line.endswith(b'gz'):
                urllib.request.retrieve(line)

1 个答案:

答案 0 :(得分:0)

您的脚本中存在一些错误,首先您需要解析url中的文件,然后检查它是否为gz文件

我尝试使用urllib2

做一个例子
import urllib2
import re
directory = 'https://eogdata.mines.edu/wwwdata/viirs_products/vnf/v30/'
sock = urllib2.urlopen(directory)
sock.close()
found_files = re.findall(r'href="(.*?)"', sock.read()) # here you parse all the files available for download
for file in found_files:
    if file.endswith('gz'):
        file_location = directory+file # the gz file location
        print "downloading %s from %s" % (file, file_location)
        file_download = urllib2.urlopen(file_location) # get file from url
        with open(file, "wb") as local_file: # open a file with the same name of our gz file
            local_file.write(file_download.read()) # write data to our file
        file_download.close()