Perl基于连续字段值将行加载到数组中

时间:2018-04-20 00:15:08

标签: arrays perl csv

Perl基于连续的字段值将行加载到数组中。

我正在尝试编写一个脚本来遍历文件的每一行并查看第二个字段。如果第二个字段与后面的字段匹配,则将整行推入数组。一旦第二个字段遇到与前一个字段不同的值,停止将行推入数组。 然后打印数组。在下面的集合中,将跳过第二个字段值为PUSA的大多数行,但是第二个字段包含WMCE的行将被推送到数组中。

15:15:07.705 "PUSA17122100vx1m" STE 
15:15:08.709 "PUSA17122100w9sn" STE 
15:50:25.244 "PUSA171221014uk8" STE 
15:50:26.509 "PUSA171221014vpo" STE 
15:50:26.750 "PUSA171221015j7w" STE 
13:58:34.518 "WMCEQ42PRD_NX:EQ-58661535-751d143b2002:171221"  STE 
16:05:31.310 "WMCEQ42PRD_NX:EQ-58661535-751d143b2002:171221"  STE 
16:05:31.310 "WMCEQ42PRD_NX:EQ-58661535-751d143b2002:171221"  STE 
16:05:34.938 "WMCEQ42PRD_NX:EQ-58661535-751d143b2002:171221"  STE 
16:03:35.805 "WMCEQ42PRD_NX:EQ-58661535-751d143b2002:171221"  EOM 
16:03:36.420 "WMCEQ42PRD_NX:EQ-58661535-751d143b2002:171221"  EOM 
15:47:40.061 "PUSA171221015gtm"  STE 
15:47:41.460 "PUSA171221015mmi"  STE 
15:47:45.635 "PUSA17122101536p"  STE 
10:35:50.524 "PUSA171221007k8z"  STE 
10:40:11.406 "PUSA171221007vwl"  STE 
13:51:04.820 "PUSS171221000jpu"  STE 
14:42:50.589 "PUSS17122100193k"  STE 
09:49:53.111 "PUSA171221002a7g"  STE 
13:58:34.562 "WMCEQ42PRD_NX:EQ-58661583-a62e3e5ad011:171221"  STE 
16:05:31.302 "WMCEQ42PRD_NX:EQ-58661583-a62e3e5ad011:171221"  STE 
16:05:31.302 "WMCEQ42PRD_NX:EQ-58661583-a62e3e5ad011:171221"  STE 
16:05:34.931 "WMCEQ42PRD_NX:EQ-58661583-a62e3e5ad011:171221"  STE 
16:03:36.396 "WMCEQ42PRD_NX:EQ-58661583-a62e3e5ad011:171221"  EOM 
16:03:35.859 "WMCEQ42PRD_NX:EQ-58661583-a62e3e5ad011:171221"  EOM 
15:15:06.747 "PUSA17122100w7fw"  STE 
15:15:08.348 "PUSA17122100vrv8"  STE 
15:15:08.542 "PUSA17122100vzhu"  STE 

这是我到目前为止所拥有的。我尝试保存第二个(@ $ row [1])字段值,然后将其与下一行的值进行匹配。 但我得到的这些数组中有两行。

#!/usr/bin/perl
use Text::CSV   ;
use Time::Local ;

use strict ;
use warnings ;
my $file = $ARGV[0] ;

open my $fh, "<", $file or die "$file: $!" ;


my $csv = Text::CSV->new ({
    binary    => 1,
    auto_diag => 1,
    });
while (my $row = $csv->getline ($fh)) {
    print "@$row\n" ;
}

2 个答案:

答案 0 :(得分:0)

如果我理解正确,你想在两个或多个连续的行具有匹配的第二个字段时打印这些行。

我的解决方案并不要求您展望下一行。而是将行存储到临时数组中,其中最后一行可以轻松匹配。遇到输入的不匹配或结束时,如果临时数组包含多个(匹配的)项,则将其刷新输出。

#!/usr/bin/perl

use strict;
use warnings;
use Text::CSV   ;
use Time::Local ;

my $file = $ARGV[0] ;

open my $fh, "<", $file or die "$file: $!" ;

my $csv = Text::CSV->new({ binary    => 1,
                           auto_diag => 1,
                         });
my @temp_row_storage = ();

while (my $row = $csv->getline($fh)) {
    if (@temp_row_storage  and                    # if we have stored rows; and
        $temp_row_storage[-1][1] ne $row->[1]) {  # the last stored row differs
                                                  # from the current row
        # => print the stored rows if there are at least 2 matching ones
        print(map { "@$_\n" } @temp_row_storage)  if  @temp_row_storage >= 2;

        @temp_row_storage = ();  # empty - these were already printed
    }

    # always store the current row
    push(@temp_row_storage, $row);
}

# cleanup: print the last batch of rows if there were at least 2 matching ones
print(map { "@$_\n" } @temp_row_storage)  if  @temp_row_storage >= 2;

输入CSV文件:

15:15:07.705,"PUSA17122100vx1m",STE
15:15:08.709,"PUSA17122100w9sn",STE
15:50:25.244,"PUSA171221014uk8",STE
15:50:26.509,"PUSA171221014vpo",STE
15:50:26.750,"PUSA171221015j7w",STE
13:58:34.518,"WMCEQ42PRD_NX:EQ-58661535-751d143b2002:171221",STE
16:05:31.310,"WMCEQ42PRD_NX:EQ-58661535-751d143b2002:171221",STE
16:05:31.310,"WMCEQ42PRD_NX:EQ-58661535-751d143b2002:171221",STE
16:05:34.938,"WMCEQ42PRD_NX:EQ-58661535-751d143b2002:171221",STE
16:03:35.805,"WMCEQ42PRD_NX:EQ-58661535-751d143b2002:171221",EOM
16:03:36.420,"WMCEQ42PRD_NX:EQ-58661535-751d143b2002:171221",EOM
15:47:40.061,"PUSA171221015gtm",STE
15:47:41.460,"PUSA171221015mmi",STE
15:47:45.635,"PUSA17122101536p",STE
10:35:50.524,"PUSA171221007k8z",STE
10:40:11.406,"PUSA171221007vwl",STE
13:51:04.820,"PUSS171221000jpu",STE
14:42:50.589,"PUSS17122100193k",STE
09:49:53.111,"PUSA171221002a7g",STE
13:58:34.562,"WMCEQ42PRD_NX:EQ-58661583-a62e3e5ad011:171221",STE
16:05:31.302,"WMCEQ42PRD_NX:EQ-58661583-a62e3e5ad011:171221",STE
16:05:31.302,"WMCEQ42PRD_NX:EQ-58661583-a62e3e5ad011:171221",STE
16:05:34.931,"WMCEQ42PRD_NX:EQ-58661583-a62e3e5ad011:171221",STE
16:03:36.396,"WMCEQ42PRD_NX:EQ-58661583-a62e3e5ad011:171221",EOM
16:03:35.859,"WMCEQ42PRD_NX:EQ-58661583-a62e3e5ad011:171221",EOM
15:15:06.747,"PUSA17122100w7fw",STE
15:15:08.348,"PUSA17122100vrv8",STE
15:15:08.542,"PUSA17122100vzhu",STE

输出:

13:58:34.518 WMCEQ42PRD_NX:EQ-58661535-751d143b2002:171221 STE
16:05:31.310 WMCEQ42PRD_NX:EQ-58661535-751d143b2002:171221 STE
16:05:31.310 WMCEQ42PRD_NX:EQ-58661535-751d143b2002:171221 STE
16:05:34.938 WMCEQ42PRD_NX:EQ-58661535-751d143b2002:171221 STE
16:03:35.805 WMCEQ42PRD_NX:EQ-58661535-751d143b2002:171221 EOM
16:03:36.420 WMCEQ42PRD_NX:EQ-58661535-751d143b2002:171221 EOM
13:58:34.562 WMCEQ42PRD_NX:EQ-58661583-a62e3e5ad011:171221 STE
16:05:31.302 WMCEQ42PRD_NX:EQ-58661583-a62e3e5ad011:171221 STE
16:05:31.302 WMCEQ42PRD_NX:EQ-58661583-a62e3e5ad011:171221 STE
16:05:34.931 WMCEQ42PRD_NX:EQ-58661583-a62e3e5ad011:171221 STE
16:03:36.396 WMCEQ42PRD_NX:EQ-58661583-a62e3e5ad011:171221 EOM
16:03:35.859 WMCEQ42PRD_NX:EQ-58661583-a62e3e5ad011:171221 EOM

编辑:

如果要保留/返回第二个字段周围的引号,可以使用

print(map { qq!$_->[0] "$_->[1]" $_->[2]\n! } @temp_row_storage)
    if  @temp_row_storage >= 2;

print(map { sprintf(qq!%s "%s" %s\n!, @$_) } @temp_row_storage)
    if  @temp_row_storage >= 2;

但在这两种情况下,您必须知道每行将包含三个字段,以使实现可靠地工作。

答案 1 :(得分:-1)

试试这个。如果这不是您想要的,请澄清您的问题

use strict ;
use warnings ;
open (IN, $ARGV[0]);

my $prev_row = "";
my @rec;
while (my $row = <IN>) {
    @rec = split(" ", $row);
    if ($prev_row eq "") {
        print "$row" ;
    } else {
        if($rec[1] eq $prev_row) {
            #skip
        } else {
            print "$row";
        }
    }
    $prev_row = $rec[1];
}

输出:

15:15:07.705 'PUSA17122100vx1m' STE
15:15:08.709 "PUSA17122100w9sn" STE
15:50:25.244 "PUSA171221014uk8" STE
15:50:26.509 "PUSA171221014vpo" STE
15:50:26.750 "PUSA171221015j7w" STE
13:58:34.518 "WMCEQ42PRD_NX:EQ-58661535-751d143b2002:171221"  STE
15:47:40.061 "PUSA171221015gtm"  STE
15:47:41.460 "PUSA171221015mmi"  STE
15:47:45.635 "PUSA17122101536p"  STE
10:35:50.524 "PUSA171221007k8z"  STE
10:40:11.406 "PUSA171221007vwl"  STE
13:51:04.820 "PUSS171221000jpu"  STE
14:42:50.589 "PUSS17122100193k"  STE
09:49:53.111 "PUSA171221002a7g"  STE
13:58:34.562 "WMCEQ42PRD_NX:EQ-58661583-a62e3e5ad011:171221"  STE
15:15:06.747 "PUSA17122100w7fw"  STE
15:15:08.348 "PUSA17122100vrv8"  STE
15:15:08.542 "PUSA17122100vzhu"  STE