Question

在bash中，您可以连接gzip压缩文件，结果是有效的gzip压缩文件。据我所知，我一直能够将这些“连接”的gzip文件视为普通的gzip压缩文件（我上面链接的示例代码）：

echo 'Hello world!' > hello.txt
echo 'Howdy world!' > howdy.txt
gzip hello.txt 
gzip howdy.txt

cat hello.txt.gz howdy.txt.gz > greetings.txt.gz

gunzip greetings.txt.gz

cat greetings.txt

哪个输出

Hello world!
Howdy world!

但是，当尝试使用Perl的core IO::Uncompress::Gunzip module读取同一个文件时，它不会超过第一个原始文件。结果如下：

./my_zcat greetings.txt.gz
Hello world!

以下是my_zcat的代码：

#!/bin/env perl
use strict;
use warnings;
use v5.10;

use IO::Uncompress::Gunzip qw($GunzipError);

my $file_name = shift;

my $fh = IO::Uncompress::Gunzip->new($file_name) or die $GunzipError;

while (defined(my $line = readline $fh))
{
    print $line;
}

如果我在创建新的gzip压缩文件之前完全解压缩文件，我就没有这个问题：

zcat hello.txt.gz howdy.txt.gz | gzip > greetings_via_zcat.txt.gz
./my_zcat greetings_via_zcat.txt.gz
Hello world!
Howdy world!

那么，greetings.txt.gz和greetings_via_zcat.txt.gz之间的区别是什么以及为什么IO::Uncompress::Gunzip可以正常使用greetings.txt.gz？

基于此answer to another question，我猜测IO::Uncompress::Gunzip会因为文件之间的元数据而混乱。但是，由于greetings.txt.gz是有效的Gzip文件，我希望IO::Uncompress::Gunzip可以正常工作。

我现在的解决方法是来自zcat的管道（当然这对Windows用户没什么帮助）：

#!/bin/env perl
use strict;
use warnings;
use v5.10;

my $file_name = shift;

open(my $fh, '-|', "zcat $file_name");

while (defined(my $line = readline $fh))
{
    print $line;
}

Answer 1

IO ::压缩常见问题解答部分Dealing with concatenated gzip files中明确介绍了这一点。基本上，在构造IO :: Uncompress :: Gunzip对象时，您只需要包含 MultiStream 选项。

这是definition of the MultiStream option：

<强> MultiStream => 0|1

如果输入文件/缓冲区包含多个   压缩数据流，此选项将解压缩整个批次   单个数据流。

默认为0。

所以你的代码需要这个改变

my $fh = IO::Uncompress::Gunzip->new($file_name, MultiStream => 1) or die $GunzipError;

IO :: Uncompress :: Gunzip在“concatenated”gzip压缩文件中的第一个“原始”gzip压缩文件后停止

1 个答案: