Question

我有一个如下所示的文件，其中以数字开头的行是我的样本的ID，以下行是数据。

10001;02/07/98;;PI;M^12/12/59^F^^SP^09/12/55
;;;;;M1|F1|SP1;9;9;9;9;9;9;;D16S539
;;;;;M1|F1|SP1;9;9;9;9;9;9;;D7S820
;;;;;M1|F1|SP1;9;9;9;9;9;9;;D13S317
;;;;;M1|F1|SP1;9;9;9;9;9;9;;D5S818
10002;02/07/98;;RJ;F^20/04/86^SP^
;;;;;F1|SP1;;;12;10;12;11;;D10S212
;;;;;F1|SP1;;;8;8;10;8;;D7S820
;;;;;F1|SP1;;;12;11;14;11;;D13S317
;;;;;F1|SP1;;;13;12;13;8;;D5S818

对于包含数据的行，我想测试字段6-11是否相同，因为我只想知道数据不等于（在第一种情况他们都是'9'。

所以我考虑分割线并将它们存储为数组，然后将数组与〜运算符进行比较。但是，如果我在while循环中读取文件并且每行重新定义数组，我该怎么做呢？或者也许有更好的方法来做到这一点。

提前致谢！

这是一个伪代码来说明我想要做的事情：

open FILE, $ARGV[0] or die $!;
while (<FILE>) {
    chomp;
    my @field = split /;/;
    if ($field[0] eq '') {
        if @fields[6 .. 11] is not equal to @fields[6 .. 11] in all the next lines {
            do my calculation;
        }
    }
}

Answer 1

我说的是数据真的代表两个记录吗？如果是这样，您希望累积完整记录的行。

my @super_rec;
while (<>) {
    chomp;
    my @fields = split /;/;
    if ($fields[0] ne '') {
       process_rec(\@super_rec) if @super_rec;
       @super_rec = \@fields;
    } else {
       push @super_rec, \@fields;
    }
}

process_rec(\@super_rec) if @super_rec;

然后，您的问题可以回答。

sub process_rec {
    my ($super_rec) = @_;
    my ($rec, @subrecs) = @$super_rec;

    my $do_calc = 0;
    for my $i (1..$#subrecs) {
        if (  $subrecs[0][ 6] ne $subrecs[$i][ 6]
           || $subrecs[0][ 7] ne $subrecs[$i][ 7]
           || $subrecs[0][ 8] ne $subrecs[$i][ 8]
           || $subrecs[0][ 9] ne $subrecs[$i][ 9]
           || $subrecs[0][10] ne $subrecs[$i][10]
           || $subrecs[0][11] ne $subrecs[$i][11]
        ) {
           $do_calc = 1;
           last;
        }
    }

    if ($do_calc) {
       ...
    }
}

Answer 2

我假设您希望比较各行之间的数据，而不是单行。如果我错了，请忽略我的其余部分。

我这样做的方法是将字段6到11重新加入字符串。将第一行的数据保存为$ firstdata，并将每个连续行的数据作为$ nextdata进行比较。每次数据不匹配时，您都会启动$ difference计数器。当你得到一个ID行时，检查先前的$ difference是否大于零，如果是这样，你的计算（你可能需要在其他一些变量中保存ID行和其他字段）。然后重新初始化$ differences和$ firstdata变量。

my $firstdata = "";
my $nextdata = "";
my $differences = 0;
open FILE, $ARGV[0] or die $!;
while (<FILE>) {
    chomp;
    my @field = split /;/;
    if ($field[0] eq '') {
        $nextdata = join(';', @fields[6..11]);
        if ($firstdata && ($nextdata ne $firstdata)) {
            $differences++;
        } else {
            $firstdata = $nextdata;
        }
    } else {
        if ($differences) {
            # do your calculation for previous ID
        }
        $firstdata = "";
        $differences = 0;
    }
}
if ($differences) {
    # do your calculation one last time for the last ID
}

Answer 3

这是使用Regex执行此操作的方法。这可能比其他方法效率低，如果索引从6固定到11，并且已知只是那些，因为它将遍历整个String： -

open FILE, $ARGV[0] or die $!;
while (<FILE>) {
    chomp;
    my $num = 0;
    my $same = 1;
    while (/;(\d+);/) {

       if ($num == 0) { $num = $1; } 
       elsif ($1 != $num) { $same = 0; last; }

       # Substitute current digit matched with x (or any char) 
       # to avoid infinite loop
       s/$1/x/; 
    }  

    if ($same) {
       print "All digits same";
    }
}

Answer 4

使用Text::CSV_XS模块，您可以执行以下操作：

use strict;
use warnings;
use Text::CSV_XS;
use feature 'say';

my $csv = Text::CSV_XS->new({
        sep_char    => ";",
        binary      => 1,
    });

my %data;
my @hdrs;  # store initial order of headers
my $hdr;
while (my $row = $csv->getline(*DATA)) {
    if ($row->[0] =~ /^\d+$/) {
        $csv->combine(@$row) or die "Cannot combine: " .
            $csv->error_diag();
        $hdr = $csv->string();   # recreate the header 
        push @hdrs, $hdr;        # save list of headers
    } else {
        push @{ $data{$hdr} }, [ @{$row}[6..11] ];
    }
}

for (@hdrs) {
    say "$_\n   arrays are: " . (is_identical($data{$_}) ? "same":"diff");
}

sub is_identical {
    my $last;
    for (@{$_[0]}) {         # argument is two-dimensional array
        $last //= $_;
        return 0 unless ( @$_ ~~ @$last );
    }
    return 1;                # default = all arrays were identical
}


__DATA__
10001;02/07/98;;PI;M^12/12/59^F^^SP^09/12/55
;;;;;M1|F1|SP1;9;9;9;9;9;9;;D16S539
;;;;;M1|F1|SP1;9;9;9;9;9;9;;D7S820
;;;;;M1|F1|SP1;9;9;9;9;9;9;;D13S317
;;;;;M1|F1|SP1;9;9;9;9;9;9;;D5S818
10002;02/07/98;;RJ;F^20/04/86^SP^
;;;;;F1|SP1;;;12;10;12;11;;D10S212
;;;;;F1|SP1;;;8;8;10;8;;D7S820
;;;;;F1|SP1;;;12;11;14;11;;D13S317
;;;;;F1|SP1;;;13;12;13;8;;D5S818

<强>输出：

10001;02/07/98;;PI;M^12/12/59^F^^SP^09/12/55
   arrays are: same
10002;02/07/98;;RJ;F^20/04/86^SP^
   arrays are: diff

如何比较while循环中读取的文件的两行（如果它们相等或不相等）？

4 个答案: