Question

我有一个程序可以下载目录，然后解压缩这些目录，最后解压缩每个目录中的所有json文件。我必须下载1260个目录，每个目录有1000个文件，300 MB。所以看起来像这样：

di1.zip
  |_file1.json.gz
  ...
  |_file1000.json.gz
di2.zip
  |_file1.json.gz
  ...
  |_file1000.json.gz
....
dir1260.zip
  |_file1.json.gz
  ...
  |_file1000.json.gz

这是我的代码：

def ProcesssDir(dirs_links_file):

with open(dirs_links_file, 'r') as inputFile:
    lines = inputFile.readlines()
    for line in lines:

        #Download
        directory = subprocess.Popen("wget -c " + line, shell=True).wait() 

        #Unzip:
        for nameDirZip in glob.glob('*.zip'):
            UnzipDir = zipfile.ZipFile(nameDirZip)
            UnzipDir.extractall()
            nameDir = nameDirZip[:-4] + "/" #This is just to get the name of the new dir.

            subprocess.Popen("gunzip -d " + nameDir + "*.gz", shell=True).wait()

这很有效但非常非常慢。每个目录花了20分钟。我怎么能更快地做到这一点？

使用python解压缩jsons文件的更快方法

0 个答案: