我正在使用Digest :: MD5计算数据流的MD5;也就是GZIPped文件(准确地说是3000),太大而无法放入RAM。所以我正在这样做:
use Digest::MD5 qw(md5_base64);
my ($filename) = @_; # this is in a sub
my $ctx = Digest::MD5 -> new;
$openme = $filename; # Usually, it's a plain file
$openme = "gunzip -c '$filename' |" if ($filename =~ /\.gz$/); # is gz
open (FILE, $openme); # gunzip to STDOUT
binmode(FILE);
$ctx -> addfile(*FILE); # passing filehandle
close(FILE);
这是成功的。 addfile
整齐地吸收了gunzip的输出,并给出了正确的MD5。
但是,我真的很想知道数据集的大小(在这种情况下为压缩后的“文件”)。
我可以再添加一个
$size = 0 + `gunzip -c very/big-file.gz | wc -c`;
,但这将涉及两次读取文件。
是否有任何方法可以提取Digest :: MD5占用的字节数?我尝试捕获结果:$result = $ctx -> addfile(*FILE);
并在$ result和$ ctx上都执行了Data :: Dumper,但是没有发现有趣的事情。
编辑:文件通常不压缩。添加了代码以显示我的实际工作。
答案 0 :(得分:3)
我将在perl中完成所有操作,而无需依赖外部程序进行解压缩:
#!/usr/bin/perl
use warnings;
use strict;
use feature qw/say/;
use IO::Uncompress::Gunzip qw/$GunzipError/;
use Digest::MD5;
my $filename = shift or die "Missing gzip filename!\n";
my $md5 = Digest::MD5->new;
# Allow for reading both gzip format files and uncompressed files.
# This is the default behavior, but might as well be explicit about it.
my $z = IO::Uncompress::Gunzip->new($filename, Transparent => 1)
or die "Unable to open $filename: $GunzipError\n";
my $len = 0;
while ((my $blen = $z->read(my $block)) > 0) {
$len += $blen;
$md5->add($block);
}
die "There was an error reading the file: $GunzipError\n" unless $z->eof;
say "Total uncompressed length: $len";
say "MD5: ", $md5->hexdigest;
如果您想使用gunzip
而不是核心IO::Uncompress::Gunzip
模块,尽管可以使用read
一次获取一大块数据,但是您可以做类似的事情:>
#!/usr/bin/perl
use warnings;
use strict;
use autodie; # So we don't have to explicitly check for i/o related errors
use feature qw/say/;
use Digest::MD5;
my $filename = shift or die "Missing gzip filename!\n";
my $md5 = Digest::MD5->new;
# Note use of lexical file handle and safer version of opening a pipe
# from a process that eliminates shell shenanigans. Also uses the :raw
# perlio layer instead of calling binmode on the handle (which has the
# same effect)
open my $z, "-|:raw", "gunzip", "-c", $filename;
# Non-compressed version
# open my $z, "<:raw", $filename;
my $len = 0;
while ((my $blen = read($z, my $block, 4096)) > 0) {
$len += $blen;
$md5->add($block);
}
say "Total uncompressed length: $len";
say "MD5: ", $md5->hexdigest;
答案 1 :(得分:2)
您可以自己阅读内容,并将其输入到$ctx->add($data)
中,并不断地统计已通过的数据量。无论是在单个调用中还是在多个调用中添加所有数据,都不会对基础算法产生任何影响。这些文档包括:
All these lines will have the same effect on the state of the $md5 object: $md5->add("a"); $md5->add("b"); $md5->add("c"); $md5->add("a")->add("b")->add("c"); $md5->add("a", "b", "c"); $md5->add("abc");
表示您一次只能执行一次。