将唯一元素添加到由regex确定的Perl数组中

时间:2013-05-22 15:52:07

标签: regex arrays perl uniqueidentifier

我正在编写一个perl脚本来分析错误代码并确定它们是否是唯一的。该错误是唯一的,具体取决于它所在的行。标准错误消息可能是:

RT Warning: No condition matches in 'unique case' statement.
    "/user/foo/project", line 218, for ..

很多这些错误消息在我正在抓取的字符串中有多个数字。因此,我想要做的是,在“line”之后抓取第一个出现的数字,并且只有在数组中不存在该值时才将其添加到数组中。这是我到目前为止所得到的:

my $path = RT Warning: No condition matches in 'unique case' statement.
    "/user/foo/project", line 218
$path =~ m/(\d+)/;
print("Error occurs on line $1\n"); 
if(grep(/^$1$/, @RTarray))
{
    print("Not unique.\n");
}
else
{
    push(@RTarray, $1); 
    print("Found a unique error!\n");
}

所以,显然我没有检查它是否在关键字“line”之后,因为我不太确定如何根据我当前处理正则表达式的方式来做到这一点。另外,我认为我没有正确地向我的数组添加元素。请帮忙!

1 个答案:

答案 0 :(得分:2)

你应该使用哈希。它具有内置的独特性,您甚至无需检查。

以下是一个例子:

my %seen;

while (my $line = <$fh>) {

  if ($line =~ m/line (\d+)/) {
    my $ln = $1;
    if ( ! $seen{$ln}++ ) { 
      # this will check first and then increment. If it was encountered before,
      # it will already contain a true value, and thus the block will be skipped.
      # if it has not been encountered before, it will go into the block and...

      # do various operations on the line number
    }
  }

}

您的%seen现在包含所有有错误的行,以及每行多少行:

print Dumper \%seen:

$VAR1 = {
  10 => 1,
  255 => 5,
  1337 => 1,
}

这告诉我们第10行有一个错误,第1337行有一个错误。根据你的代码,这些错误是唯一的。第255行中的五个错误不是唯一的,因为它在日志中出现了五次。


如果你想要删除其中的一些,请使用delete删除整个键/值对,或者$foo{$1}--减少或delete $foo{$1} unless --$foo{$1}减少等等把它排成一行。


编辑:我看过你的代码。实际上,唯一缺少的是正则表达式和引号。你真的尝试过吗?有用。 :)

my @RTarray;

while (my $line = <DATA>) {
  $line =~ m/line (\d+)/;
  print("Error occurs on line $1\n"); 
  if( grep { $_ eq $1 } @RTarray ) { # this eq is the same as your regex, just faster
    print("Not unique.\n");
  } else {
    print "Found a unique error in line $1!\n";
    push @RTarray, $1; 
  }
}

__DATA__
RT Warning: No condition matches in 'unique case' statement. "/user/foo/project", line 218, for
RT Warning: No condition matches in 'unique case' statement. "/user/foo/project", line 3, for
RT Warning: No condition matches in 'unique case' statement. "/user/foo/project", line 44, for
RT Warning: No condition matches in 'unique case' statement. "/user/foo/project", line 218, for
RT Warning: No condition matches in 'unique case' statement. "/user/foo/project", line 7, for
RT Warning: No condition matches in 'unique case' statement. "/user/foo/project", line 7, for
RT Warning: No condition matches in 'unique case' statement. "/user/foo/project", line 7, for

这将打印:

Error occurs on line 218
Found a unique error in line 218!
Error occurs on line 3
Found a unique error in line 3!
Error occurs on line 44
Found a unique error in line 44!
Error occurs on line 218
Not unique.
Error occurs on line 7
Found a unique error in line 7!
Error occurs on line 7
Not unique.

我认为这是正确的。我有218个双倍和7个三倍,它们都发现了它们。

我只用一个文件句柄循环替换了缺少引号的字符串,以便在多行上测试它。我还修复了缺少单词 line 的正则表达式,但这个特定错误消息甚至不需要。