我有几个tar档案...其中一些有另一个tar档案。我编写了一个代码以从存档中提取特定文件。到目前为止,它仍然有效,但是当脚本从嵌套存档中提取文件时,提取文件仍然是存档。但是当我尝试手动打开它时,它说档案已损坏。当我手动提取文件时,该文件有效。
#Files in one Folder without checking for existant files (work stable!)
import tarfile
import os, os.path
from pathlib import Path
#time
time = "2350"
#working dir
windows = "C:/Users/Elisabeth/Desktop"
ubuntu = "/home/elisabeth/Dokumente/master/radolan_data/raw"
download_directory = "/radolan_downloads" #Directory where files will be saved
os.chdir(ubuntu + download_directory)
#Actual Working Dir
print("Actual Working dir:", os.getcwd())
#All files inside Working dir
files = os.listdir()
print("Files inside this folder: ", len(files))
#Iterate through folders get tar archiv names loop through them and extract only with specified time
tar_files = [x for x in files if ".tar.gz" in x]
print("Tar files inside this folder: ", len(tar_files))
for file in tar_files:
print("Open tar: ", file)
tar = tarfile.open(file)
names = tar.getnames()
print(len(names), "files are inside the tar")
names_f = [x for x in names if time in x]
if len(names) == 1:
tar_final = tarfile.open(fileobj=tar.extractfile(names[0]))
names_final = tar_final.getnames()
print(len(names_final), "files inside second tar")
names_f_final = [x for x in names_final if time in x]
tar.extractall(members=[x for x in tar_final.getmembers() if x.name in names_f_final])
print("Finish with extraction of files: ", names_f_final)
continue
else:
tar.extractall(members=[x for x in tar.getmembers() if x.name in names_f])
print("Finish with extraction of files: ", names_f)
continue
其他部分可以很好地工作,它可以解压缩正确的文件,并且该文件是可读的二进制文件。如果部分也提取文件,文件名也必须像文件名一样,但是它说它是一种存档类型,当我用存档处理程序打开它时说存档已损坏?我无法上传tar归档文件,因为它有几个GB。也许是因为我在if部分的内存中打开了tar对象?