问题

Question

我目前正在实现一个Ereader库（skyepub），它要求我实现一个检查zipEntry是否存在的方法。在他们的演示版中，解决方案很简单：

public boolean isExists(String baseDirectory,String contentPath) {
    setupZipFile(baseDirectory,contentPath);
    if (this.isCustomFont(contentPath)) {
        String path = baseDirectory +"/"+ contentPath;
        File file = new File(path);
        return file.exists();
    }

    ZipEntry entry = this.getZipEntry(contentPath);
    if (entry==null) return false;
    else return true;       
}

// Entry name should start without / like META-INF/container.xml 

private ZipEntry getZipEntry(String contentPath) {

    if (zipFile==null) return null;

    String[] subDirs = contentPath.split(Pattern.quote(File.separator));

    String corePath = contentPath.replace(subDirs[1], "");

    corePath=corePath.replace("//", "");

    ZipEntry entry = zipFile.getEntry(corePath.replace(File.separatorChar, '/'));

    return entry;

}

如您所见，您可以使用getZipEntry(contentPath);

在O（1）时间内访问相关的ZipEntry

但是，在我的情况下，我无法直接从文件系统读取zip文件（出于安全原因，必须从内存中读取）..所以我的ifExists实现实际上通过zip文件一次一个条目，直到找到有问题的zipEntry，这里是相关部分：

try {
        final InputStream stream = dbUtil.getBookStream(bookEditionID);
        if( stream == null) return null;

        final ZipInputStream zip = new ZipInputStream(stream);

        ZipEntry entry;
        do {
            entry = zip.getNextEntry();
            if( entry == null) {
                zip.close();
                return null;
            }
        } while( !entry.getName().equals(zipEntryName));

    } catch( IOException e) {
        Log.e("demo", "Can't get content data for "+contentPath);
        return null;
    }

    return data;

所以如果数据存在，ifExists返回true，否则返回false。

问题

有没有办法可以在O（1）时间而不是O（n）时间内从整个ZipInputStream中找到有问题的zip条目？

相关

请参阅this问题和this回答。

Answer 1

如果存档的内容在内存中，那么它是可搜索的，您可以搜索中心目录并自己使用它。除了ZipFile之外，File和Apache Commons Compress都没有相同的工作，但其他开源库可能（不确定zip4j）。

Apache Commons Compress'ZipFile内部搜索中心目录并解析它的代码应该很容易适应存档作为byte[]提供的情况。事实上，有一个补丁尚未应用，可以作为COMPRESS-327的一部分提供帮助。

Answer 2

zip存档中的条目无法在O（1）时间内真正加载。如果我们查看zip archive的结构，它看起来像这样：

  [local file header 1]
  [encryption header 1]
  [file data 1]
  [data descriptor 1]
  ... 
  [local file header n]
  [encryption header n]
  [file data n]
  [data descriptor n]
  [archive decryption header] 
  [archive extra data record] 
  [central directory header 1]
  .
  [central directory header n]
  [zip64 end of central directory record]
  [zip64 end of central directory locator] 
  [end of central directory record]

基本上，有一些压缩文件带有一些标题和一个“中心目录”，其中包含有关文件的所有元数据（中央目录标题）。查找条目的唯一有效方法是扫描中心目录（more info）：

...不得扫描ZIP文件顶部的条目，因为只有中央目录指定文件块的开始位置

由于中央目录标题没有索引，因此您只能在O(n)中获取一个条目，其中n是归档中的文件数。

更新：不幸的是，我所知道的所有使用流而不是文件的zip库都使用本地文件头并扫描整个流，包括内容。它们也不容易弯曲。如何避免扫描我发现的整个存档的唯一方法是自己调整库。

更新2：我已经冒昧地为您的目的修改上述zip4j库。假设您的zip文件是在字节数组中读取的，并且您已经在zip4j版本1.3.2上添加了依赖项，那么您可以使用MemoryHeaderReader和RandomByteStream，如下所示：

String myZipFile = "...";
byte[] bytes = readFile();
MemoryHeaderReader headerReader = new MemoryHeaderReader(RandomAccessStream.fromBytes(bytes));
ZipModel zipModel = headerReader.readAllHeaders();
FileHeader myFile = Zip4jUtil.getFileHeader(zipModel, myZipFile)
boolean fileIsPresent = myFile != null;

它可以在 O（entryCount）中工作而无需读取整个存档，这应该相当快。我还没有对它进行彻底的测试，但它应该让你知道如何根据你的目的调整zip4j。

Answer 3

从技术上讲，搜索始终为O（ n ），其中 n 是zip文件中的条目数，因为您必须通过以下方式进行线性搜索：中心目录或通过本地标题。

您似乎暗示zip文件完全加载到内存中。在这种情况下，最快的方法是搜索中心目录中的条目。如果找到它，那么该目录条目将指向本地标题。

如果您在同一个zip文件上进行了大量搜索，那么您可以在O（ n ）时间内在中心目录中构建名称的哈希表，然后使用该表在大约O（1）时间内搜索给定名称。

如何从内存中

问题

相关

3 个答案: