我遇到了使用commons压缩库来创建目录的tar.gz的问题。我有一个目录结构如下。
parent/
child/
file1.raw
fileN.raw
我正在使用以下代码进行压缩。它运行良好,没有例外。但是,当我尝试解压缩tar.gz时,我得到一个名为“childDirToCompress”的文件。它的大小正确,因此文件在tarring过程中显然已经相互附加。所需的输出将是一个目录。我无法弄清楚我做错了什么。任何明智的公共审判者能否指引我走上正确的道路?
CreateTarGZ() throws CompressorException, FileNotFoundException, ArchiveException, IOException {
File f = new File("parent");
File f2 = new File("parent/childDirToCompress");
File outFile = new File(f2.getAbsolutePath() + ".tar.gz");
if(!outFile.exists()){
outFile.createNewFile();
}
FileOutputStream fos = new FileOutputStream(outFile);
TarArchiveOutputStream taos = new TarArchiveOutputStream(new GZIPOutputStream(new BufferedOutputStream(fos)));
taos.setBigNumberMode(TarArchiveOutputStream.BIGNUMBER_STAR);
taos.setLongFileMode(TarArchiveOutputStream.LONGFILE_GNU);
addFilesToCompression(taos, f2, ".");
taos.close();
}
private static void addFilesToCompression(TarArchiveOutputStream taos, File file, String dir) throws IOException{
taos.putArchiveEntry(new TarArchiveEntry(file, dir));
if (file.isFile()) {
BufferedInputStream bis = new BufferedInputStream(new FileInputStream(file));
IOUtils.copy(bis, taos);
taos.closeArchiveEntry();
bis.close();
}
else if(file.isDirectory()) {
taos.closeArchiveEntry();
for (File childFile : file.listFiles()) {
addFilesToCompression(taos, childFile, file.getName());
}
}
}
答案 0 :(得分:13)
我遵循了这个解决方案,它一直有效,直到我处理了一大组文件,并在处理完15000 - 16000个文件后随机崩溃。以下行是泄漏文件处理程序:
IOUtils.copy(new FileInputStream(f), tOut);
并且代码在操作系统级别崩溃并出现“Too many open files”错误 以下小改动解决了问题:
FileInputStream in = new FileInputStream(f);
IOUtils.copy(in, tOut);
in.close();
答案 1 :(得分:12)
我还没弄清楚到底出了什么问题,但是我发现了一个有效的例子。对不起风滚草!
public void CreateTarGZ()
throws FileNotFoundException, IOException
{
try {
System.out.println(new File(".").getAbsolutePath());
dirPath = "parent/childDirToCompress/";
tarGzPath = "archive.tar.gz";
fOut = new FileOutputStream(new File(tarGzPath));
bOut = new BufferedOutputStream(fOut);
gzOut = new GzipCompressorOutputStream(bOut);
tOut = new TarArchiveOutputStream(gzOut);
addFileToTarGz(tOut, dirPath, "");
} finally {
tOut.finish();
tOut.close();
gzOut.close();
bOut.close();
fOut.close();
}
}
private void addFileToTarGz(TarArchiveOutputStream tOut, String path, String base)
throws IOException
{
File f = new File(path);
System.out.println(f.exists());
String entryName = base + f.getName();
TarArchiveEntry tarEntry = new TarArchiveEntry(f, entryName);
tOut.putArchiveEntry(tarEntry);
if (f.isFile()) {
IOUtils.copy(new FileInputStream(f), tOut);
tOut.closeArchiveEntry();
} else {
tOut.closeArchiveEntry();
File[] children = f.listFiles();
if (children != null) {
for (File child : children) {
System.out.println(child.getName());
addFileToTarGz(tOut, child.getAbsolutePath(), entryName + "/");
}
}
}
}
答案 2 :(得分:7)
我最终做了以下事情:
public URL createTarGzip() throws IOException {
Path inputDirectoryPath = ...
File outputFile = new File("/path/to/filename.tar.gz");
try (FileOutputStream fileOutputStream = new FileOutputStream(outputFile);
BufferedOutputStream bufferedOutputStream = new BufferedOutputStream(fileOutputStream);
GzipCompressorOutputStream gzipOutputStream = new GzipCompressorOutputStream(bufferedOutputStream);
TarArchiveOutputStream tarArchiveOutputStream = new TarArchiveOutputStream(gzipOutputStream)) {
tarArchiveOutputStream.setBigNumberMode(TarArchiveOutputStream.BIGNUMBER_POSIX);
tarArchiveOutputStream.setLongFileMode(TarArchiveOutputStream.LONGFILE_GNU);
List<File> files = new ArrayList<>(FileUtils.listFiles(
inputDirectoryPath,
new RegexFileFilter("^(.*?)"),
DirectoryFileFilter.DIRECTORY
));
for (int i = 0; i < files.size(); i++) {
File currentFile = files.get(i);
String relativeFilePath = new File(inputDirectoryPath.toUri()).toURI().relativize(
new File(currentFile.getAbsolutePath()).toURI()).getPath();
TarArchiveEntry tarEntry = new TarArchiveEntry(currentFile, relativeFilePath);
tarEntry.setSize(currentFile.length());
tarArchiveOutputStream.putArchiveEntry(tarEntry);
tarArchiveOutputStream.write(IOUtils.toByteArray(new FileInputStream(currentFile)));
tarArchiveOutputStream.closeArchiveEntry();
}
tarArchiveOutputStream.close();
return outputFile.toURI().toURL();
}
}
这可以解决其他解决方案中出现的一些边缘情况。
答案 3 :(得分:0)
我必须对@merrick解决方案进行一些调整才能使其与路径相关。也许与最新的Maven依赖关系。当前接受的解决方案对我不起作用。
import java.io.BufferedOutputStream;
import java.io.File;
import java.io.FileInputStream;
import java.io.FileNotFoundException;
import java.io.FileOutputStream;
import java.io.IOException;
import java.util.ArrayList;
import java.util.List;
import org.apache.commons.compress.archivers.tar.TarArchiveEntry;
import org.apache.commons.compress.archivers.tar.TarArchiveOutputStream;
import org.apache.commons.compress.compressors.gzip.GzipCompressorOutputStream;
import org.apache.commons.io.FileUtils;
import org.apache.commons.io.IOUtils;
import org.apache.commons.io.filefilter.DirectoryFileFilter;
import org.apache.commons.io.filefilter.RegexFileFilter;
public class TAR {
public static void CreateTarGZ(String inputDirectoryPath, String outputPath) throws IOException {
File inputFile = new File(inputDirectoryPath);
File outputFile = new File(outputPath);
try (FileOutputStream fileOutputStream = new FileOutputStream(outputFile);
BufferedOutputStream bufferedOutputStream = new BufferedOutputStream(fileOutputStream);
GzipCompressorOutputStream gzipOutputStream = new GzipCompressorOutputStream(bufferedOutputStream);
TarArchiveOutputStream tarArchiveOutputStream = new TarArchiveOutputStream(gzipOutputStream)) {
tarArchiveOutputStream.setBigNumberMode(TarArchiveOutputStream.BIGNUMBER_POSIX);
tarArchiveOutputStream.setLongFileMode(TarArchiveOutputStream.LONGFILE_GNU);
List<File> files = new ArrayList<>(FileUtils.listFiles(
inputFile,
new RegexFileFilter("^(.*?)"),
DirectoryFileFilter.DIRECTORY
));
for (int i = 0; i < files.size(); i++) {
File currentFile = files.get(i);
String relativeFilePath = inputFile.toURI().relativize(
new File(currentFile.getAbsolutePath()).toURI()).getPath();
TarArchiveEntry tarEntry = new TarArchiveEntry(currentFile, relativeFilePath);
tarEntry.setSize(currentFile.length());
tarArchiveOutputStream.putArchiveEntry(tarEntry);
tarArchiveOutputStream.write(IOUtils.toByteArray(new FileInputStream(currentFile)));
tarArchiveOutputStream.closeArchiveEntry();
}
tarArchiveOutputStream.close();
}
}
}
行家
<dependency>
<groupId>commons-io</groupId>
<artifactId>commons-io</artifactId>
<version>2.6</version>
</dependency>
<dependency>
<groupId>org.apache.commons</groupId>
<artifactId>commons-compress</artifactId>
<version>1.18</version>
</dependency>
答案 4 :(得分:0)
我使用的某些东西(通过from airflow import DAG
from airflow.operators.bash_operator import BashOperator
from datetime import datetime, timedelta
import pendulum
local_tz = pendulum.timezone("America/New_York")
default_args=dict(
owner = 'airflow',
start_date=datetime(2018, 11, 7, 13, 5, tzinfo=local_tz), # 1:05 PM on Nov 7
schedule_interval=timedelta(hours=1),
)
dag = DAG('my_test_dag', catchup=False, default_args=default_args)
op = BashOperator(
task_id='my_test_dag',
bash_command="bash -i /home/user/shell_script.sh",
dag=dag
)
API)可以链接Files.walk
gzip(tar(youFile));
答案 5 :(得分:0)
在下面查看Apache commons-compress
和File Walker示例。
此示例tar.gz
是目录。
public static void createTarGzipFolder(Path source) throws IOException {
if (!Files.isDirectory(source)) {
throw new IOException("Please provide a directory.");
}
// get folder name as zip file name
String tarFileName = source.getFileName().toString() + ".tar.gz";
try (OutputStream fOut = Files.newOutputStream(Paths.get(tarFileName));
BufferedOutputStream buffOut = new BufferedOutputStream(fOut);
GzipCompressorOutputStream gzOut = new GzipCompressorOutputStream(buffOut);
TarArchiveOutputStream tOut = new TarArchiveOutputStream(gzOut)) {
Files.walkFileTree(source, new SimpleFileVisitor<>() {
@Override
public FileVisitResult visitFile(Path file,
BasicFileAttributes attributes) {
// only copy files, no symbolic links
if (attributes.isSymbolicLink()) {
return FileVisitResult.CONTINUE;
}
// get filename
Path targetFile = source.relativize(file);
try {
TarArchiveEntry tarEntry = new TarArchiveEntry(
file.toFile(), targetFile.toString());
tOut.putArchiveEntry(tarEntry);
Files.copy(file, tOut);
tOut.closeArchiveEntry();
System.out.printf("file : %s%n", file);
} catch (IOException e) {
System.err.printf("Unable to tar.gz : %s%n%s%n", file, e);
}
return FileVisitResult.CONTINUE;
}
@Override
public FileVisitResult visitFileFailed(Path file, IOException exc) {
System.err.printf("Unable to tar.gz : %s%n%s%n", file, exc);
return FileVisitResult.CONTINUE;
}
});
tOut.finish();
}
}
此示例提取了tar.gz
,并检查了滑锁攻击。
public static void decompressTarGzipFile(Path source, Path target)
throws IOException {
if (Files.notExists(source)) {
throw new IOException("File doesn't exists!");
}
try (InputStream fi = Files.newInputStream(source);
BufferedInputStream bi = new BufferedInputStream(fi);
GzipCompressorInputStream gzi = new GzipCompressorInputStream(bi);
TarArchiveInputStream ti = new TarArchiveInputStream(gzi)) {
ArchiveEntry entry;
while ((entry = ti.getNextEntry()) != null) {
Path newPath = zipSlipProtect(entry, target);
if (entry.isDirectory()) {
Files.createDirectories(newPath);
} else {
// check parent folder again
Path parent = newPath.getParent();
if (parent != null) {
if (Files.notExists(parent)) {
Files.createDirectories(parent);
}
}
// copy TarArchiveInputStream to Path newPath
Files.copy(ti, newPath, StandardCopyOption.REPLACE_EXISTING);
}
}
}
}
private static Path zipSlipProtect(ArchiveEntry entry, Path targetDir)
throws IOException {
Path targetDirResolved = targetDir.resolve(entry.getName());
Path normalizePath = targetDirResolved.normalize();
if (!normalizePath.startsWith(targetDir)) {
throw new IOException("Bad entry: " + entry.getName());
}
return normalizePath;
}
参考