如何在不使用临时文件的情况下从Java中的嵌套zip文件中读取数据?

时间:2017-11-09 18:17:36

标签: java zip zipfile zipinputstream

我正在尝试从嵌套的zip存档中提取文件并在内存中处理它们。

这个问题关于:

  1. 如何用Java读取zip文件:不,问题是如何在zip文件中的zip文件中读取zip文件等等(如嵌套的zip文件)。

  2. 在磁盘上写下临时结果:不,我要求在内存中完成所有操作。我发现许多答案使用了将结果临时写入磁盘的效率不高的技术,但这不是我想要做的。

  3. 示例:

      

    Zipfile - > Zipfile1 - > Zipfile2 - > Zipfile3

    目标:提取每个嵌套zip文件中的数据,全部在内存中并使用Java。

    你说,

    ZipFile就是答案吗?不,它不是,它适用于第一次迭代,即:

      

    Zipfile - > Zipfile1

    但是一旦你到达Zipfile2,并执行:

    ZipInputStream z = new ZipInputStream(zipFile.getInputStream( zipEntry) ) ;
    

    你会得到一个NullPointerException。

    我的代码:

    public class ZipHandler {
    
        String findings = new String();
        ZipFile zipFile = null;
    
        public void init(String fileName) throws AppException{
    
            try {
            //read file into stream
            zipFile = new ZipFile(fileName);  
            Enumeration<?> enu = zipFile.entries();  
            exctractInfoFromZip(enu);
    
            zipFile.close();
            } catch (FileNotFoundException e) {
            e.printStackTrace();
    
            } catch (IOException e) {
                e.printStackTrace();
        }
    }
    
    //The idea was recursively extract entries using ZipFile
    public void exctractInfoFromZip(Enumeration<?> enu) throws IOException, AppException{   
    
        try {
            while (enu.hasMoreElements()) { 
                ZipEntry zipEntry = (ZipEntry) enu.nextElement();
    
                String name = zipEntry.getName();
                long size = zipEntry.getSize();
                long compressedSize = zipEntry.getCompressedSize();
    
                System.out.printf("name: %-20s | size: %6d | compressed size: %6d\n", 
                        name, size, compressedSize);
    
                // directory ?
                if (zipEntry.isDirectory()) {
                    System.out.println("dir found:" + name);
                    findings+=", " + name; 
                    continue;
                } 
    
                if (name.toUpperCase().endsWith(".ZIP") ||  name.toUpperCase().endsWith(".GZ")) {
                    String fileType = name.substring(
                            name.lastIndexOf(".")+1, name.length());
    
                    System.out.println("File type:" + fileType);
                    System.out.println("zipEntry: " + zipEntry);
    
                    if (fileType.equalsIgnoreCase("ZIP")) {
    //ZipFile here returns a NULL pointer when you try to get the first nested zip
                        ZipInputStream z = new ZipInputStream(zipFile.getInputStream(zipEntry) ) ;
                        System.out.println("Opening ZIP as stream: " + name);
    
                        findings+=", " + name;
    
                        exctractInfoFromZip(zipInputStreamToEnum(z));
                    } else if (fileType.equalsIgnoreCase("GZ")) {
    //ZipFile here returns a NULL pointer when you try to get the first nested zip      
                        GZIPInputStream z = new GZIPInputStream(zipFile.getInputStream(zipEntry) ) ;
                        System.out.println("Opening ZIP as stream: " + name);
    
                        findings+=", " + name;
    
                        exctractInfoFromZip(gZipInputStreamToEnum(z));
                    } else
                        throw new AppException("extension not recognized!");
                } else {
                    System.out.println(name);
                    findings+=", " + name;
                }
            }
        } catch (IOException e) {
            // TODO Auto-generated catch block
            e.printStackTrace();
        }
    
        System.out.println("Findings " + findings);
    } 
    
    public Enumeration<?> zipInputStreamToEnum(ZipInputStream zStream) throws IOException{
    
        List<ZipEntry> list = new ArrayList<ZipEntry>();    
    
        while (zStream.available() != 0) {
            list.add(zStream.getNextEntry());
        }
    
        return Collections.enumeration(list);
    } 
    

2 个答案:

答案 0 :(得分:2)

这是我在内存中解压缩文件的方式:

代码并不干净所有,但我明白规则是发布工作的东西,所以我希望这有帮助所以

我所做的是使用递归方法导航复杂的ZIP文件并提取 夹 其他内拉链 档 并将结果保存在内存中以便以后使用它们。

我发现的主要内容我想与您分享:

如果你有嵌套的zip文件,那么ZipFile是没用的 2您必须使用基本的Zip InputStream和Outputstream 3我只使用递归编程来解压缩嵌套的拉链

package course.hernan;

import java.io.BufferedInputStream;

import java.io.BufferedOutputStream;
import java.io.ByteArrayInputStream;
import java.io.ByteArrayOutputStream;
import java.io.File;
import java.io.FileInputStream;
import java.io.FileNotFoundException;
import java.io.FileOutputStream;
import java.io.IOException;
import java.util.ArrayDeque;
import java.util.Deque;
import java.util.LinkedHashMap;
import java.util.List;
import java.util.Map;
import java.util.zip.ZipEntry;
import java.util.zip.ZipInputStream;
import java.util.zip.ZipOutputStream;

import org.apache.commons.io.IOUtils;

public class FileReader {

private static final int  BUFFER_SIZE = 2048;

    public static void main(String[] args) {
        try {
            File f = new File("DIR/inputs.zip");
            FileInputStream fis = new FileInputStream(f);
            BufferedInputStream bis = new BufferedInputStream(fis);
            ByteArrayOutputStream baos = new ByteArrayOutputStream();
            BufferedOutputStream bos = new BufferedOutputStream(baos);
            byte[] buffer = new byte[BUFFER_SIZE];
            while (bis.read(buffer, 0, BUFFER_SIZE) != -1) {
               bos.write(buffer);
            }

            bos.flush();
            bos.close();
            bis.close();

            //This STACK has the output byte array information 
            Deque<Map<Integer, Object[]>> outputDataStack = ZipHandler1.unzip(baos);


        } catch (FileNotFoundException e) {
            // TODO Auto-generated catch block
            e.printStackTrace();
        } catch (IOException e) {
            // TODO Auto-generated catch block
            e.printStackTrace();
        }
    }
}    
package course.hernan;

import java.io.BufferedInputStream;
import java.io.BufferedOutputStream;
import java.io.ByteArrayInputStream;
import java.io.ByteArrayOutputStream;
import java.util.ArrayDeque;
import java.util.ArrayList;
import java.util.Deque;
import java.util.HashMap;
import java.util.LinkedHashMap;
import java.util.List;
import java.util.Map;
import java.util.SortedMap;
import java.util.zip.ZipEntry;
import java.util.zip.ZipInputStream;

import org.apache.commons.lang3.StringUtils;

public class ZipHandler1 {

  private static final int BUFFER_SIZE = 2048;

  private static final String ZIP_EXTENSION = ".zip";
  public static final Integer FOLDER = 1;
  public static final Integer ZIP = 2;
  public static final Integer FILE = 3;


  public static Deque<Map<Integer, Object[]>> unzip(ByteArrayOutputStream zippedOutputFile) {

    try {

      ZipInputStream inputStream = new ZipInputStream(
          new BufferedInputStream(new ByteArrayInputStream(
              zippedOutputFile.toByteArray())));

      ZipEntry entry;

      Deque<Map<Integer, Object[]>> result = new ArrayDeque<Map<Integer, Object[]>>();

      while ((entry = inputStream.getNextEntry()) != null) {

        LinkedHashMap<Integer, Object[]> map = new LinkedHashMap<Integer, Object[]>();
        ByteArrayOutputStream outputStream = new ByteArrayOutputStream();
        System.out.println("\tExtracting entry: " + entry);
        int count;
        byte[] data = new byte[BUFFER_SIZE];

        if (!entry.isDirectory()) {
          BufferedOutputStream out = new BufferedOutputStream(
              outputStream, BUFFER_SIZE);

          while ((count = inputStream.read(data, 0, BUFFER_SIZE)) != -1) {
            out.write(data, 0, count);
          }

          out.flush();
          out.close();

          //  recursively unzip files
          if (entry.getName().toUpperCase().endsWith(ZIP_EXTENSION.toUpperCase())) {
            map.put(ZIP, new Object[] {entry.getName(), unzip(outputStream)});
            result.add(map);
            //result.addAll();
          } else { 
            map.put(FILE, new Object[] {entry.getName(), outputStream});
            result.add(map);
          }
        } else {
          map.put(FOLDER, new Object[] {entry.getName(), unzip(outputStream)});
          result.add(map);
        }
      }

      inputStream.close();

      return result;

    } catch (Exception e) {
      throw new RuntimeException(e);
    }
  }

答案 1 :(得分:1)

我没有尝试过但是使用ZipInputStream你可以阅读任何InputStream that contains a ZIP file as data. Iterate through the entries and when you found the correct entry use the ZipInputStream to create another nested ZipInputStream`。

以下代码演示了这一点。想象一下,我们在readme.txt0.zip再次压缩1.zip,其中已压缩2.zip。现在我们阅读readme.txt中的一些文字:

try (FileInputStream fin = new FileInputStream("D:/2.zip")) {
    ZipInputStream firstZip = new ZipInputStream(fin);
    ZipInputStream zippedZip = new ZipInputStream(findEntry(firstZip, "1.zip"));
    ZipInputStream zippedZippedZip = new ZipInputStream(findEntry(zippedZip, "0.zip"));

    ZipInputStream zippedZippedZippedReadme = findEntry(zippedZippedZip, "readme.txt");
    InputStreamReader reader = new InputStreamReader(zippedZippedZippedReadme);
    char[] cbuf = new char[1024];
    reader.read(cbuf);
    System.out.println(new String(cbuf));
    .....

public static ZipInputStream findEntry(ZipInputStream in, String name) throws IOException {
    ZipEntry entry = null;
    while ((entry = in.getNextEntry()) != null) {
        if (entry.getName().equals(name)) {
            return in;
        }
    }
    return null;
}

请注意,代码非常难看,不会关闭任何内容,也不会检查错误。它只是一个最小化的版本,演示了它的工作原理。

理论上,没有限制你级联到另一个ZipInputStream的数量。数据永远不会写入临时文件。解密仅在您阅读每个InputStream时按需执行。