Perl处理并在数组中存储唯一值

时间:2016-08-18 11:18:19

标签: perl

下面是日志文件内容,我正在读取日志文件并根据字符串对其进行分组 - JIRA。

JIRA: COM-1234
Program:Development
Reviewer:John Wick 
Description:Genral fix
rev:r345676
------------------------------------------
JIRA:COM-1234
Program:Development
Reviewer:None
Description:Updating Received 
rev:r909276
------------------------------------------
JIRA: COM-6789
Program:Testing
Reviewer:Balise Mat
Description:Audited
rev:r876391
------------------------------------------
JIRA: COM-6789
Program:Testing
Reviewer:Chan Joe
Description:SO hwat 
rev:r698392
------------------------------------------
JIRA: COM-6789
Program:Testing
Reviewer:Chan Joe
Description:Paid the Due
rev:r327896
------------------------------------------

我的要求是,迭代每个独特的JIRA值 - COM-1234,COM-6789等,并将以下或即时细节存储到单个数组中,如

(适用于COM-1234)

@prog = Development;
@Reviewer = John Wick;
@Description = Genral fix;
@rev = r345676;

(适用于COM-6789)

@prog = Testing;
@Reviewer = Balise Mat;
@Description = Audited;
@rev = r876391;

如果JIRA值相同,例如COM-1234重复2次,COM-6789重复3次,仍然只将以下或即时细节推送到相应的阵列。 (即键'Program','Reviewer'的值......)

(COM-1234)

@prog = Development;
@Reviewer = None;
@Description = Updating Received ;
@rev = r909276;

我是Perl的新手,我可以设法只获取唯一值,而不确定如何将以下值推送到单个数组。 任何输入都会非常有用。感谢。

我的代码不完整:

#!/usr/bin/perl
use warnings;
use Data::Dumper;

$/ = "%%%%";
open (AFILE, ""<", ""D:\\mine\\out.txt");
    while (<AFILE>)
    {
     @temp = split(/-{20,}/, $_);
    }
close (AFILE);

my %jiraHash;
for ($i=0; $i<=@temp; $i++) {
      if (($temp[$i] =~ /(((JIRA|SVN)\s{0,1}:(\s{0,2}[A-Za-z0-9-\s]{4,9}),
          {0,1}\s{0,2}){1,5})\nProgram\s{0,1}:\s{0,2}Development/) || 
          ($temp[$i] =~ /(((JIRA|SVN):(\s{0,2}[A-Za-z0-9-\s]{4,9}),
          {0,1}\s{0,2}){1,5})\nProgram\:\s{0,2}Testing/))    {

            $jiraId = $2;
            $jiraId =~ s/JIRA\s*\://;
            $temp[$i] =~ s/\w{3,}\s?:\s?//g;
            #print "==>$jiraId\n";
            $jiraHash{$jiraId}  = $temp[$i];

        } else {
            #print "NOT\n";
        }   
}
print Dumper(%jiraHash);

我计划以下面的格式显示HTML报告

Program: Development
FOR ID:  COM-1234

Revision    Reviewer    Comment
r345676     John Wick   Genral fix

Revision    Reviewer    Comment
r909276     None        Updating Received 

Program: Testing
FOR ID: COM-6789

Revision    Reviewer    Comment
r876391     Balise Mat  Audited

Revision    Reviewer    Comment
r698392     Chan Joe    SO hwat 

Revision    Reviewer    Comment
r327896     Chan Joe    Paid the Due

3 个答案:

答案 0 :(得分:4)

听起来这些数据应该在数据库中。

但是将它解析为数据结构相对简单。在这里,我已经去了一个散列,其中键是Jira标识符,值是对包含散列引用的数组的引用。每个引用的哈希都包含其中一个记录的详细信息。

#!/usr/bin/perl

use strict;
use warnings;
use 5.010;

use Data::Dumper;

my @records = do {
  local $/ = '------------------------------------------';
  <>;
};

chomp @records;

my %jira;

foreach (@records) {
  next unless /\S/;

  my %rec = /^(\w+):\s*(.+?)$/mg;
  push @{$jira{$rec{JIRA}}}, \%rec;
}

say Dumper \%jira;

当您在给定数据上运行它时,您将获得此输出:

$VAR1 = {
          'COM-6789' => [
                          {
                            'Program' => 'Testing',
                            'JIRA' => 'COM-6789',
                            'rev' => 'r876391',
                            'Reviewer' => 'Balise Mat',
                            'Description' => 'Audited'
                          },
                          {
                            'Program' => 'Testing',
                            'JIRA' => 'COM-6789',
                            'rev' => 'r698392',
                            'Reviewer' => 'Chan Joe',
                            'Description' => 'SO hwat '
                          },
                          {
                            'Program' => 'Testing',
                            'JIRA' => 'COM-6789',
                            'rev' => 'r327896',
                            'Reviewer' => 'Chan Joe',
                            'Description' => 'Paid the Due'
                          }
                        ],
          'COM-1234' => [
                          {
                            'Program' => 'Development',
                            'JIRA' => 'COM-1234',
                            'rev' => 'r345676',
                            'Reviewer' => 'John Wick ',
                            'Description' => 'Genral fix'
                          },
                          {
                            'Program' => 'Development',
                            'JIRA' => 'COM-1234',
                            'rev' => 'r909276',
                            'Reviewer' => 'None',
                            'Description' => 'Updating Received '
                          }
                        ]
        };

从那里开始,显示数据相对简单:

foreach my $j (keys %jira) {
  say "JIRA: $j";
  foreach (@{$jira{$j}}) {
    say "Program: $_->{Program}";
    say "Revision: $_->{rev}";
    # etc...
  }
}

答案 1 :(得分:2)

由于您的数据很好地构建为每个数据项一行,并且由于Perlby默认处理逐行输入,我建议这样做而不是弄乱$/或regexps来分割输入记录。这确实需要你记住每条记录第一行的JIRA问题ID,但这很简单 - 只需将它存储在循环外声明的变量中,如下所示:

#!/usr/bin/perl
use strict;
use warnings;
use Data::Dumper qw(Dumper);

my %records;
my $jiraID;
while (<>) {
    chomp;
    if (/^JIRA: (.*)/ and not defined $jiraID) {
        $jiraID = $1;
        $records{$jiraID} = {};  # wipe out any old data for this ID
    } elsif (/^(Program|Reviewer|Description|rev):(.*)/ and defined $jiraID) {
        $records{$jiraID}{$1} = $2;
    } elsif (/^-{20,}$/) {
        undef $jiraID;  # end of record
    } else {
        die qq(Unexpected input line "$_");
    }
}

print Dumper(\%records);

上面的代码从作为命令行参数提供的任何文件中读取其输入,或者如果没有任何文件,则使用<>默认输入操作符从标准输入中读取输入。如果您希望从您自己打开的特定文件句柄中读取,您当然可以提供一个。

请注意,上面的代码仅存储每个ID的最后一条记录。如果要将所有这些存储在数组中,请替换以下行:

        $records{$jiraID} = {};  # wipe out any old data for this ID

使用:

        push @{$records{$jiraID}}, {};  # start new record for this ID

并更改行:

        $records{$jiraID}{$1} = $2;

为:

        $records{$jiraID}[-1]{$1} = $2;

聚苯乙烯。上面代码中的regexp基于您的示例数据。如果您的真实数据中包含其他类型的行(或变化,例如空白量),您还需要调整它们以匹配这些行。我将脚本编码为die,如果它看到任何意外情况,那么很容易判断是否发生这种情况。

更新:根据您在撰写此答案时发布的示例输出,您似乎希望按JIRAProgram对数据进行分组线。这也很容易做到,例如,像这样:

#!/usr/bin/perl
use strict;
use warnings;
use Data::Dumper qw(Dumper);

my %records;
my $jiraID, $progID;
while (<>) {
    chomp;
    if (/^JIRA:\s*(.*)/ and not defined $jiraID) {
        $jiraID = $1;
    } elsif (/^Program:\s*(.*)/ and defined $jiraID and not defined $progID) {
        $progID = $1;
        push @{$records{$jiraID}{$progID}}, {};  # start new record for these IDs
    } elsif (/^(Reviewer|Description|rev):(.*)/ and defined $progID) {
        $records{$jiraID}{$progID}[-1]{$1} = $2;
    } elsif (/^-{20,}$/) {
        undef $jiraID, $progID;  # end of record
    } else {
        die qq(Unexpected input line "$_");
    }
}

print Dumper(\%records);

请注意,我首先使用JIRA ID然后通过程序对输出数据结构进行分组,但当然这些结构很容易交换(如果您愿意,甚至可以组合成单个哈希键)。

答案 2 :(得分:1)

这不会处理最终输出,但这可能是为每个哈希元素(票证ID)存储列表列表的简化方法,以及最后的一些示例输出。它的格式没有你想要的格式,但这应该很容易:

use strict;

my (%jira, @values, $ticket_id);

open my $IN, '<', 'jira.txt' or die;
while (<$IN>) {
  chomp;
  my ($key, $val) = split /:\s*/;

  if ($key eq 'JIRA') {
    if (@values) {
      push @{$jira{$ticket_id}}, [ @values ];
      @values = ();
    }
    $ticket_id = $val;

  } elsif ($key eq 'Program') {
    $values[0] = $val;
  } elsif ($key eq 'Reviewer') {
    $values[1] = $val;
  } elsif ($key eq 'Description') {
    $values[2] = $val;
  } elsif ($key eq 'rev') {
    $values[3] = $val;
  }
}
close $IN;

push @{$jira{$ticket_id}}, [ @values ];

while (my ($ticket, $ref) = each %jira) {
  print "$ticket =>\n";
  foreach my $line_ref (@$ref) {
    print join "\t", @$line_ref, "\n";
  }
}

示例输出:

COM-1234 =>
Development     John Wick       Genral fix      r345676
Development     None    Updating Received       r909276
COM-6789 =>
Testing Balise Mat      Audited r876391
Testing Chan Joe        SO hwat         r698392
Testing Chan Joe        Paid the Due    r327896