尝试为深层目录树中的所有文件计算增量md5
摘要,但我无法重复使用"已计算的摘要。
这是我的测试代码:
#!/usr/bin/env perl
use 5.014;
use warnings;
use Digest::MD5;
use Path::Tiny;
# create some test-files in the tempdir
my @filenames = qw(a b);
my $testdir = Path::Tiny->tempdir;
$testdir->child($_)->spew($_) for @filenames; #create 2 files
dirmd5($testdir, @filenames);
exit;
sub dirmd5 {
my($dir, @files) = @_;
my $dirctx = Digest::MD5->new; #the md5 for the whole directory
for my $fname (@files) {
# calculate the md5 for one file
my $filectx = Digest::MD5->new;
my $fd = $dir->child($fname)->openr_raw;
$filectx->addfile($fd);
close $fd;
say "md5 for $fname : ", $filectx->clone->hexdigest;
# want somewhat "add" the above file-md5 to the directory md5
# this not work - even if the $filectx isn't reseted (note the "clone" above)
#$dirctx->add($filectx);
# works adding the file as bellow,
# but this calculating the md5 again
# e.g. for each file the calculation is done two times...
# once for the file-alone (above)
# and second time for the directory
# too bad if case of many and large files. ;(
# especially, if i want calculate the md5sum for the whole directory trees
$fd = $dir->child($fname)->openr_raw;
$dirctx->addfile($fd);
close $fd;
}
say "md5 for dir: ", $dirctx->hexdigest;
}
以上版画:
md5 for a : 0cc175b9c0f1b6a831c399e269772661
md5 for b : 92eb5ffee6ae2fec3ad71c777531578f
md5 for dir: 187ef4436122d1cc2f40dc2b92f0eba0
这是正确的,但遗憾的是效率低下的方式。 (见评论)。
阅读the docs,我没有找到重用已经计算过的md5的方法。例如如上所述$dirctx->add($filectx);
。可能这是不可能的。
存在任何检查求和方式,允许在某种程度上重用已经计算的校验和,因此,我可以计算整个目录树的校验和/摘要,而无需为每个文件多次计算摘要?
参考:尝试解决this question
答案 0 :(得分:2)
没有。没有任何内容将MD5(initial data)
和MD5(new data)
与MD5(initial data + new data)
相关联,因为流中数据的位置以及值。否则,它不会是一个非常有用的错误检查,因为aba
,aab
和baa
都具有相同的校验和
如果文件足够小,您可以将每个文件读入内存并使用该副本将数据添加到两个摘要中。这样可以避免从大容量存储中读取两次
#!/usr/bin/env perl
use 5.014;
use warnings 'all';
use Digest::MD5;
use Path::Tiny;
# create some test-files in the tempdir
my @filenames = qw(a b);
my $testdir = Path::Tiny->tempdir;
$testdir->child($_)->spew($_) for @filenames; # create 2 files
dirmd5($testdir, @filenames);
sub dirmd5 {
my ($dir, @files) = @_;
my $dir_ctx = Digest::MD5->new; #the md5 for the whole directory
for my $fname ( @files ) {
my $data = $dir->child($fname)->slurp_raw;
# calculate the md5 for one file
my $file_md5 = Digest::MD5->new->add($data)->hexdigest;
say "md5 for $fname : $file_md5";
$dir_ctx->add($data);
}
my $dir_md5 = $dir_ctx->hexdigest;
say "md5 for dir: $dir_md5";
}
如果文件很大,那么剩下的唯一优化是避免重新打开同一个文件,而是在第二次读取之前将其倒回到开头
#!/usr/bin/env perl
use 5.014;
use warnings 'all';
use Digest::MD5;
use Path::Tiny;
use Fcntl ':seek';
# create some test-files in the tempdir
my @filenames = qw(a b);
my $testdir = Path::Tiny->tempdir;
$testdir->child($_)->spew($_) for @filenames; # create 2 files
dirmd5($testdir, @filenames);
sub dirmd5 {
my ($dir, @files) = @_;
my $dir_ctx = Digest::MD5->new; # The digest for the whole directory
for my $fname ( @files ) {
my $fh = $dir->child($fname)->openr_raw;
# The digest for just the current file
my $file_md5 = Digest::MD5->new->addfile($fh)->hexdigest;
say "md5 for $fname : $file_md5";
seek $fh, 0, SEEK_SET;
$dir_ctx->addfile($fh);
}
my $dir_md5 = $dir_ctx->hexdigest;
say "md5 for dir: $dir_md5";
}