Question

我试图在Perl中找到两个目录之间的区别。我想优化它以便有效运行，也不确定如何忽略某些文件（比如扩展名.txt或.o）

我到目前为止的代码是：

use strict;
use warnings;
use Parallel::ForkManager;
use File::Find;
use List::MoreUtils qw(uniq);

my $dir1 = "/path/to/dir/first";
my $dir2 = "/path/to/dir/second";
my @comps = ('abc');
my (%files1, %files2);
my $workernum = 500; 
my $pm = new Parallel::ForkManager($workernum);
my @common = ();
my @differ = ();
my @only_in_first = ();
my @only_in_second = ();

foreach my $comp (@comps) {
    find( sub { -f  ($files1{$_} = $File::Find::name) }, "$dir1");
    find( sub { -f  ($files2{$_} = $File::Find::name) }, "$dir2");
    my @all = uniq(keys %files1, keys %files2);
    for my $file (@all) {
        my $pid = $pm->start and next; # do the fork
        my $result;
        if ($files1{$file} && $files2{$file}) { # file exists in both dirs
            $result = qx(/usr/bin/diff -q $files1{$file} $files2{$file});
            if ($result =~m/^Common subdirectories/) {
                push (@common, $result);
            } else {
                push (@differ, $result);
            }
        } elsif ($files1{$file}) { 
            push (@only_in_first, $file);
        } else {
            push (@only_in_second, $file);
        }
        $pm->finish; # do the exit in child process
    }
}

Answer 1

diff实用程序有一个-r开关，允许它在子目录中工作。

这还不够吗？

Answer 2

是的，diff -r确实做了你的代码也做的事情。但是，diff -r不会对500个工作进程执行此操作。然后，diff -r可能足够快，不需要并行处理500个进程。

注意事项：

“$ var”很少需要，最好写成$ var
使用2个哈希作为差异，但仍然使用带有2个哈希键的数组的uniq（）是浪费内存和cpu周期
使用diff -q可以很容易地在perl中轻松自如，或者至少可以通过对两个文件进行stat（）并且至少在进行fork之前比较大小来轻松加速。如果文件很小，可以使用perl。
如果你真的想要diff -q分叉，至少检查$？因为可能存在例如问题查找或执行的位置。实际上，检查退出代码就足够了，而不是在stdout / stderr上执行grep
为简单起见，请使用PATH中的find，而不是绝对路径

优化在perl中递归搜索两个目录之间的差异

2 个答案: