我正在尝试从嵌套的zip存档中提取文件并在内存中处理它们。
这个问题不关于:
如何用Java读取zip文件:不,问题是如何在zip文件中的zip文件中读取zip文件等等(如嵌套的zip文件)。
在磁盘上写下临时结果:不,我要求在内存中完成所有操作。我发现许多答案使用了将结果临时写入磁盘的效率不高的技术,但这不是我想要做的。
示例:
Zipfile - > Zipfile1 - > Zipfile2 - > Zipfile3
目标:提取每个嵌套zip文件中的数据,全部在内存中并使用Java。
你说,ZipFile就是答案吗?不,它不是,它适用于第一次迭代,即:
Zipfile - > Zipfile1
但是一旦你到达Zipfile2,并执行:
ZipInputStream z = new ZipInputStream(zipFile.getInputStream( zipEntry) ) ;
你会得到一个NullPointerException。
我的代码:
public class ZipHandler {
String findings = new String();
ZipFile zipFile = null;
public void init(String fileName) throws AppException{
try {
//read file into stream
zipFile = new ZipFile(fileName);
Enumeration<?> enu = zipFile.entries();
exctractInfoFromZip(enu);
zipFile.close();
} catch (FileNotFoundException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
}
}
//The idea was recursively extract entries using ZipFile
public void exctractInfoFromZip(Enumeration<?> enu) throws IOException, AppException{
try {
while (enu.hasMoreElements()) {
ZipEntry zipEntry = (ZipEntry) enu.nextElement();
String name = zipEntry.getName();
long size = zipEntry.getSize();
long compressedSize = zipEntry.getCompressedSize();
System.out.printf("name: %-20s | size: %6d | compressed size: %6d\n",
name, size, compressedSize);
// directory ?
if (zipEntry.isDirectory()) {
System.out.println("dir found:" + name);
findings+=", " + name;
continue;
}
if (name.toUpperCase().endsWith(".ZIP") || name.toUpperCase().endsWith(".GZ")) {
String fileType = name.substring(
name.lastIndexOf(".")+1, name.length());
System.out.println("File type:" + fileType);
System.out.println("zipEntry: " + zipEntry);
if (fileType.equalsIgnoreCase("ZIP")) {
//ZipFile here returns a NULL pointer when you try to get the first nested zip
ZipInputStream z = new ZipInputStream(zipFile.getInputStream(zipEntry) ) ;
System.out.println("Opening ZIP as stream: " + name);
findings+=", " + name;
exctractInfoFromZip(zipInputStreamToEnum(z));
} else if (fileType.equalsIgnoreCase("GZ")) {
//ZipFile here returns a NULL pointer when you try to get the first nested zip
GZIPInputStream z = new GZIPInputStream(zipFile.getInputStream(zipEntry) ) ;
System.out.println("Opening ZIP as stream: " + name);
findings+=", " + name;
exctractInfoFromZip(gZipInputStreamToEnum(z));
} else
throw new AppException("extension not recognized!");
} else {
System.out.println(name);
findings+=", " + name;
}
}
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
System.out.println("Findings " + findings);
}
public Enumeration<?> zipInputStreamToEnum(ZipInputStream zStream) throws IOException{
List<ZipEntry> list = new ArrayList<ZipEntry>();
while (zStream.available() != 0) {
list.add(zStream.getNextEntry());
}
return Collections.enumeration(list);
}
答案 0 :(得分:2)
这是我在内存中解压缩文件的方式:
代码并不干净所有,但我明白规则是发布工作的东西,所以我希望这有帮助所以
我所做的是使用递归方法导航复杂的ZIP文件并提取 夹 其他内拉链 档 并将结果保存在内存中以便以后使用它们。
我发现的主要内容我想与您分享:
如果你有嵌套的zip文件,那么ZipFile是没用的 2您必须使用基本的Zip InputStream和Outputstream 3我只使用递归编程来解压缩嵌套的拉链
package course.hernan;
import java.io.BufferedInputStream;
import java.io.BufferedOutputStream;
import java.io.ByteArrayInputStream;
import java.io.ByteArrayOutputStream;
import java.io.File;
import java.io.FileInputStream;
import java.io.FileNotFoundException;
import java.io.FileOutputStream;
import java.io.IOException;
import java.util.ArrayDeque;
import java.util.Deque;
import java.util.LinkedHashMap;
import java.util.List;
import java.util.Map;
import java.util.zip.ZipEntry;
import java.util.zip.ZipInputStream;
import java.util.zip.ZipOutputStream;
import org.apache.commons.io.IOUtils;
public class FileReader {
private static final int BUFFER_SIZE = 2048;
public static void main(String[] args) {
try {
File f = new File("DIR/inputs.zip");
FileInputStream fis = new FileInputStream(f);
BufferedInputStream bis = new BufferedInputStream(fis);
ByteArrayOutputStream baos = new ByteArrayOutputStream();
BufferedOutputStream bos = new BufferedOutputStream(baos);
byte[] buffer = new byte[BUFFER_SIZE];
while (bis.read(buffer, 0, BUFFER_SIZE) != -1) {
bos.write(buffer);
}
bos.flush();
bos.close();
bis.close();
//This STACK has the output byte array information
Deque<Map<Integer, Object[]>> outputDataStack = ZipHandler1.unzip(baos);
} catch (FileNotFoundException e) {
// TODO Auto-generated catch block
e.printStackTrace();
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}
}
package course.hernan;
import java.io.BufferedInputStream;
import java.io.BufferedOutputStream;
import java.io.ByteArrayInputStream;
import java.io.ByteArrayOutputStream;
import java.util.ArrayDeque;
import java.util.ArrayList;
import java.util.Deque;
import java.util.HashMap;
import java.util.LinkedHashMap;
import java.util.List;
import java.util.Map;
import java.util.SortedMap;
import java.util.zip.ZipEntry;
import java.util.zip.ZipInputStream;
import org.apache.commons.lang3.StringUtils;
public class ZipHandler1 {
private static final int BUFFER_SIZE = 2048;
private static final String ZIP_EXTENSION = ".zip";
public static final Integer FOLDER = 1;
public static final Integer ZIP = 2;
public static final Integer FILE = 3;
public static Deque<Map<Integer, Object[]>> unzip(ByteArrayOutputStream zippedOutputFile) {
try {
ZipInputStream inputStream = new ZipInputStream(
new BufferedInputStream(new ByteArrayInputStream(
zippedOutputFile.toByteArray())));
ZipEntry entry;
Deque<Map<Integer, Object[]>> result = new ArrayDeque<Map<Integer, Object[]>>();
while ((entry = inputStream.getNextEntry()) != null) {
LinkedHashMap<Integer, Object[]> map = new LinkedHashMap<Integer, Object[]>();
ByteArrayOutputStream outputStream = new ByteArrayOutputStream();
System.out.println("\tExtracting entry: " + entry);
int count;
byte[] data = new byte[BUFFER_SIZE];
if (!entry.isDirectory()) {
BufferedOutputStream out = new BufferedOutputStream(
outputStream, BUFFER_SIZE);
while ((count = inputStream.read(data, 0, BUFFER_SIZE)) != -1) {
out.write(data, 0, count);
}
out.flush();
out.close();
// recursively unzip files
if (entry.getName().toUpperCase().endsWith(ZIP_EXTENSION.toUpperCase())) {
map.put(ZIP, new Object[] {entry.getName(), unzip(outputStream)});
result.add(map);
//result.addAll();
} else {
map.put(FILE, new Object[] {entry.getName(), outputStream});
result.add(map);
}
} else {
map.put(FOLDER, new Object[] {entry.getName(), unzip(outputStream)});
result.add(map);
}
}
inputStream.close();
return result;
} catch (Exception e) {
throw new RuntimeException(e);
}
}
答案 1 :(得分:1)
我没有尝试过但是使用ZipInputStream
你可以阅读任何InputStream that contains a ZIP file as data. Iterate through the entries and when you found the correct entry use the
ZipInputStream to create another nested
ZipInputStream`。
以下代码演示了这一点。想象一下,我们在readme.txt
内0.zip
再次压缩1.zip
,其中已压缩2.zip
。现在我们阅读readme.txt
中的一些文字:
try (FileInputStream fin = new FileInputStream("D:/2.zip")) {
ZipInputStream firstZip = new ZipInputStream(fin);
ZipInputStream zippedZip = new ZipInputStream(findEntry(firstZip, "1.zip"));
ZipInputStream zippedZippedZip = new ZipInputStream(findEntry(zippedZip, "0.zip"));
ZipInputStream zippedZippedZippedReadme = findEntry(zippedZippedZip, "readme.txt");
InputStreamReader reader = new InputStreamReader(zippedZippedZippedReadme);
char[] cbuf = new char[1024];
reader.read(cbuf);
System.out.println(new String(cbuf));
.....
public static ZipInputStream findEntry(ZipInputStream in, String name) throws IOException {
ZipEntry entry = null;
while ((entry = in.getNextEntry()) != null) {
if (entry.getName().equals(name)) {
return in;
}
}
return null;
}
请注意,代码非常难看,不会关闭任何内容,也不会检查错误。它只是一个最小化的版本,演示了它的工作原理。
理论上,没有限制你级联到另一个ZipInputStream的数量。数据永远不会写入临时文件。解密仅在您阅读每个InputStream
时按需执行。