正如标题所述,我有几个文件夹,几个.ppm.bz2文件,我想将它们准确地提取到使用python的位置。
我正在这样遍历文件夹:
import tarfile
import os
path = '/Users/ankitkumar/Downloads/colorferet/dvd1/data/images/'
folders = os.listdir(path)
for folder in folders: #the folders starting like 00001
if not folder.startswith("0"):
pass
path2 = path + folder
zips = os.listdir(path2)
for zip in zips:
if not zip.startswith("0"):
pass
path3 = path2+"/"+zip
fh = tarfile.open(path3, 'r:bz2')
outpath = path2+"/"
fh.extractall(outpath)
fh.close
`
然后我得到这个错误 `
Traceback (most recent call last):
File "ZIP.py", line 16, in <module>
fh = tarfile.open(path3, 'r:bz2')
File "/anaconda2/lib/python2.7/tarfile.py", line 1693, in open
return func(name, filemode, fileobj, **kwargs)
File "/anaconda2/lib/python2.7/tarfile.py", line 1778, in bz2open
t = cls.taropen(name, mode, fileobj, **kwargs)
File "/anaconda2/lib/python2.7/tarfile.py", line 1723, in taropen
return cls(name, mode, fileobj, **kwargs)
File "/anaconda2/lib/python2.7/tarfile.py", line 1587, in __init__
self.firstmember = self.next()
File "/anaconda2/lib/python2.7/tarfile.py", line 2370, in next
raise ReadError(str(e))
tarfile.ReadError: invalid header
`
答案 0 :(得分:0)
tarfile模块用于tar文件,包括tar.bz2
。如果您的文件不是tar
,则应直接使用bz2
模块。
另外,请尝试使用os.walk
而不是多个listdir
,因为它可以遍历树
import os
import bz2
import shutil
for path, dirs, files in os.walk(path):
for filename in files:
basename, ext = os.path.splitext(filename)
if ext.lower() != '.bz2':
continue
fullname = os.path.join(path, filename)
newname = os.path.join(path, basename)
with bz2.open(fullname) as fh, open(newname, 'wb') as fw:
shutil.copyfileobj(fh, fw)
这将解压缩所有子文件夹中所有.bz2
文件的位置。所有其他文件将保持不变。如果未压缩的文件已经存在,它将被覆盖。
请先备份数据,然后再运行破坏性代码