Question

我有两个文件，我需要进行比较以找出匹配和不匹配的数据。我现在遇到两个问题：

问题1：我尝试使用的哈希值之一只能捕获'num'的第二行

push @{hash1{name1}},$x1,$y1,$x2,$y2

但是它仍然返回'num'的第二行。

文件1：

name    foo
num     111 222 333 444
name    jack
num     999 111 222 333
num     333 444 555 777

文件2：

name    jack
num     999 111 222 333
num     333 444 555 777
name    foo
num     666 222 333 444

这是我的代码：

#!/usr/bin/perl
use strict;
use warnings;
use Data::Dumper;

my $input1=$ARGV[0];
my $input2=$ARGV[1];

my %hash1;
my %hash2;
my $name1;
my $name2;
my $x1;
my $x2;
my $y2;
my $y1;

open my $fh1,'<', $input1 or die "Cannot open file : $!\n";
while (<$fh1>)
{   
    chomp;
    if(/^name\s+(\S+)/)
    {   
        $name1 = $1; 
    }   
    if(/^num\s+(\S+)\s+(\S+)\s+(\S+)\s+(\S+)/)
    {   
        $x1 = $1; 
        $y1 = $2; 
        $x2 = $3; 
        $y2 = $4; 
    }
    $hash1{$name1}=[$x1,$y1,$x2,$y2];
}   
close $fh1;
print Dumper (\%hash1);

open my $fh2,'<', $input2 or die "Cannot open file : $!\n";
while (<$fh2>)
{   
    chomp;
    if(/^name\s+(\S+)/)
    {
        $name2 = $1; 
    }
    if(/^num\s+(\S+)\s+(\S+)\s+(\S+)\s+(\S+)/)
    {
        $x1 = $1; 
        $y1 = $2; 
        $x2 = $3;
        $y2 = $4;
    }

    $hash2{$name2}=[$x1,$y1,$x2,$y2];

}
close $fh2;
print Dumper (\%hash2);

我的输出：

$VAR1 = {
          'jack' => [
                      '333',
                      '444',
                      '555',
                      '777'
                    ],
          'foo' => [
                     '111',
                     '222',
                     '333',
                     '444'
                   ]
        };
$VAR1 = {
          'jack' => [
                      '333',
                      '444',
                      '555',
                      '777'
                    ],
          'foo' => [
                     '666',
                     '222',
                     '333',
                     '444'
                   ]
        };

我的预期输出：

$VAR1 = {
          'jack' => [ 
                      '999',
                      '111',
                      '222',
                      '333',
                      '333',
                      '444',
                      '555',
                      '777'
                    ],
          'foo' => [
                     '111',
                     '222',
                     '333',
                     '444'
                   ]
        };
$VAR1 = {
          'jack' => [ 
                      '999',
                      '111',
                      '222',
                      '333',
                      '333',
                      '444',
                      '555',
                      '777'
                    ],
          'foo' => [
                     '666',
                     '222',
                     '333',
                     '444'
                   ]
        };

问题2：我试图使用此foreach循环来进行键和值的匹配，并以表格格式打印出来。我尝试了这个：

print "Name\tx1\tX1\tY1\tX2\tY2\n"
foreach my $k1(keys %hash1)
{
    foreach my  $k2 (keys %hash2)
    {
        if($hash1{$name1} == $hash2{$name2})
        {
            print "$name1,$x1,$y1,$x2,$y2"
        }
    }
}

但是我得到了：

"my" variable %hash2 masks earlier declaration in same scope at script.pl line 67.
"my" variable %hash1 masks earlier declaration in same scope at script.pl line 69.
"my" variable $name1 masks earlier declaration in same scope at script.pl line 69.
"my" variable %hash2 masks earlier declaration in same statement at script.pl line 69.
"my" variable $name2 masks earlier declaration in same scope at script.pl line 69.
syntax error at script.pl line 65, near "$k1("
Execution of script.pl aborted due to compilation errors.

我想要的匹配输出：

Name     x1   y1   x2   y2
jack     999  111  222  333
         333  444  555  777

Answer 1

一个直接的错误是您使用$hash2{$name2}=[...]分配了一个哈希元素，该元素将覆盖该键之前的内容。因此，您的输出仅显示jake的第二组数字。您需要推送到该arrayref。下面是对代码的一些注释。

这是基本（但有效）代码。请注意并实施省略的检查。

use warnings;
use strict;
use feature 'say';

my ($f1, $f2) = @ARGV;
die "Usage: $0 file1 file2\n"  if not $f1 or not $f2;

my $ds1 = read_file($f1);
my $ds2 = read_file($f2);

compare_data($ds1, $ds2);

sub compare_data {
    my ($ds1, $ds2) = @_;    
    # Add: check whether one has more keys; work with the longer one
    foreach my $k (sort keys %$ds1) {
        if (not exists $ds2->{$k}) {
            say "key $k does not exist in dataset 2";
            next;
        }   
        # Add tests: do both datasets have the same "ref" type here?
        # If those are arrayrefs, as expected, are they the same size?

        my @data = @{$ds1->{$k}};
        foreach my $i (0..$#data) {
            if ($data[$i] ne $ds2->{$k}->[$i]) {
                say "differ for $k: $data[$i] vs $ds2->{$k}->[$i]";
            }
        }   
    }
}

sub read_file {
    my ($file) = @_; 
    open my $fh, '<', $file or die "Can't open $file: $!";
    my (%data, $name);
    while (<$fh>) {
        my @fields = split;
        if ($fields[0] eq 'name') {
            $name = $fields[1];
            next;
        }
        elsif ($fields[0] eq 'num') {
            push @{$data{$name}}, @fields[1..$#fields];
        }
    }   
    return \%data;
}

我将其保留为编码所需的打印输出格式的练习。上面的照片

differ for foo: 111 vs 666

注意代码中的注释以添加测试。当您进入数据结构进行比较时，需要检查它们在每个级别上是否携带相同类型的数据（请参见ref）以及它们的大小是否相同（因此您不会尝试阅读过去）数组的结尾）。一旦完成这项工作，就可以为此寻找模块。

我在数据比较中使用eq（在arrayrefs中），因为没有明确说明它们是数字。但是，如果确实如此，请将eq更改为==。

进行代码审查会使我们走得太远，但这是几点提示

当您发现需要这么长的变量列表时，请考虑“集合”并重新考虑针对该问题选择的数据结构。请注意，在上面的示例中，我不需要单个标量变量来存储数据（我使用一个标量变量来临时存储名称）
使用正则表达式将字符串分开是文本分析的重要组成部分-适用时。熟悉其他方法。此时，请参见split

如何将不同的值行推入哈希并将其与foreach循环进行比较

1 个答案: