从自定义路径中提取.ppm.bz2到自定义路径

时间:2018-07-24 17:42:38

标签: python extract bzip2 tarfile bz2

正如标题所述,我有几个文件夹,几个.ppm.bz2文件,我想将它们准确地提取到使用python的位置。

Directory structure image

我正在这样遍历文件夹:

 import tarfile
 import os
 path = '/Users/ankitkumar/Downloads/colorferet/dvd1/data/images/'
 folders = os.listdir(path)
 for folder in folders:  #the folders starting like 00001
     if not folder.startswith("0"):
         pass
     path2 = path + folder
     zips = os.listdir(path2)
     for zip in zips:
         if not zip.startswith("0"):
             pass
         path3 = path2+"/"+zip

         fh = tarfile.open(path3, 'r:bz2')
         outpath = path2+"/"
         fh.extractall(outpath)
         fh.close

`

然后我得到这个错误 `

Traceback (most recent call last):
  File "ZIP.py", line 16, in <module>
    fh = tarfile.open(path3, 'r:bz2')
  File "/anaconda2/lib/python2.7/tarfile.py", line 1693, in open
    return func(name, filemode, fileobj, **kwargs)
  File "/anaconda2/lib/python2.7/tarfile.py", line 1778, in bz2open
    t = cls.taropen(name, mode, fileobj, **kwargs)
  File "/anaconda2/lib/python2.7/tarfile.py", line 1723, in taropen
    return cls(name, mode, fileobj, **kwargs)
  File "/anaconda2/lib/python2.7/tarfile.py", line 1587, in __init__
    self.firstmember = self.next()
  File "/anaconda2/lib/python2.7/tarfile.py", line 2370, in next
    raise ReadError(str(e))
tarfile.ReadError: invalid header

`

1 个答案:

答案 0 :(得分:0)

tarfile模块用于tar文件,包括tar.bz2。如果您的文件不是tar,则应直接使用bz2模块。

另外,请尝试使用os.walk而不是多个listdir,因为它可以遍历树

import os
import bz2
import shutil

for path, dirs, files in os.walk(path):
    for filename in files:
        basename, ext = os.path.splitext(filename)
        if ext.lower() != '.bz2':
            continue
        fullname = os.path.join(path, filename)
        newname = os.path.join(path, basename)
        with bz2.open(fullname) as fh, open(newname, 'wb') as fw:
            shutil.copyfileobj(fh, fw)

这将解压缩所有子文件夹中所有.bz2文件的位置。所有其他文件将保持不变。如果未压缩的文件已经存在,它将被覆盖。

请先备份数据,然后再运行破坏性代码