Python2.7:并行模式的Untar文件(带线程)

时间:2016-10-05 16:06:06

标签: python multithreading tarfile

我正在学习Python线程,同时试图改进我原来不受欢迎的脚本。

它的主要部分如下:

import tarfile, os, threading

def untar(fname, path):
    print "Untarring " + fname
    try:
        ut = tarfile.open(os.path.join(path,fname), "r:gz")
        ut.extractall(path)
        ut.close()
    except tarfile.ReadError as e:          #in case it's not gziped
        print e
        ut = tarfile.open(os.path.join(path,fname), "r:*")
        ut.extractall(path)
        ut.close()

def untarFolder(path):
    if path == ".":
        path = os.getcwd()
    print "path", path
    ListTarFiles = serveMenu(path)         # function what parse folder 
                                           # content for tars, and tar.gz 
                                           # files and return list of them
    print "ListTarFiles ", ListTarFiles 

    for filename in ListTarFiles:
        print "filename: ", filename
        t = threading.Thread(target=untar, args = (filename,path))
        t.daemon = True
        t.start()
        print "Thread:", t

所以目标是解压缩给定文件夹中的所有文件,而不是一个接一个地同时并行模式。有可能吗?

输出:

bogard@testlab:~/Toolz/untar$ python untar01.py -f .
path /home/bogard/Toolz/untar
ListTarFiles ['tar1.tgz', 'tar2.tgz', 'tar3.tgz']
filename:  tar1.tgz
Untarring tar1.tgz
 Thread: <Thread(Thread-1, started daemon 140042104731392)>
filename:  tar2.tgz
Untarring tar2.tgz
 Thread: <Thread(Thread-2, started daemon 140042096338688)>
filename:  tar3.tgz
Untarring tar3.tgz
 Thread: <Thread(Thread-3, started daemon 140042087945984)>

在输出中可以看到脚本创建线程但它不会解压缩任何文件。 什么是捕获?

1 个答案:

答案 0 :(得分:0)

可能发生的是您的脚本在线程实际完成之前返回。您可以等待线程完成Thread.join()。也许尝试这样的事情:

threads = []

for filename in ListTarFiles:
    t = threading.Thread(target=untar, args = (filename,path))
    t.daemon = True
    threads.append(t)
    t.start()

# Wait for each thread to complete
for thread in threads:
    thread.join()

此外,根据您要解开的文件数量,您可能希望限制要启动的作业数量,这样您就不会尝试同时解压缩1000个文件。你可以用multiprocessing.Pool之类的东西来做这件事。