Perl,根据列值

时间:2017-01-02 01:09:57

标签: regex perl

我想解析一个包含由 TAB 分隔的列的文件,我想根据第一个第二个列,

输入:

abcd/xyz/deft3    19-12-20167    9:00    1    45.87    74
        10:00    99    167    42
        11:00    99    167.02    42
        12:00    167    167    42
        13:00    99    167    42
        14:00    167    167    42
    20-12-20167    0:00    1    45.87    74
        1:00    99    167    42
        2:00    99    167.02    42
        3:00    167    167    42
        4:00    99    167    42
        5:00    167    167    42
qerty/azer/uuui1    19-12-20167    0:00    1    45.87    74
        1:00    99    167    42
        2:00    99    167.02    42
        3:00    167    167    42
        4:00    99    167    42
        5:00    167    167    42
    20-12-20167    0:00    1    45.87    74
        1:00    99    167    42
        2:00    99    167.02    42
        3:00    167    167    42
        4:00    99    167    42
        5:00    167    167    42

预期产出:

abcd/xyz/deft3    19-12-20167    9:00    1    45.87    74
abcd/xyz/deft3    19-12-20167    10:00    99    167    42
abcd/xyz/deft3    19-12-20167    11:00    99    167.02    42
abcd/xyz/deft3    19-12-20167    12:00    167    167    42
abcd/xyz/deft3    19-12-20167    13:00    99    167    42
abcd/xyz/deft3    19-12-20167    14:00    167    167    42
abcd/xyz/deft3    20-12-20167    0:00    1    45.87    74
abcd/xyz/deft3    20-12-20167    1:00    99    167    42
abcd/xyz/deft3    20-12-20167    2:00    99    167.02    42
abcd/xyz/deft3    20-12-20167    3:00    167    167    42
abcd/xyz/deft3    20-12-20167    4:00    99    167    42
abcd/xyz/deft3    20-12-20167    5:00    167    167    42
qerty/azer/uuui1    19-12-20167    0:00    1    45.87    74
qerty/azer/uuui1    19-12-20167    1:00    99    167    42
qerty/azer/uuui1    19-12-20167    2:00    99    167.02    42
qerty/azer/uuui1    19-12-20167    3:00    167    167    42
qerty/azer/uuui1    19-12-20167    4:00    99    167    42
qerty/azer/uuui1    19-12-20167    5:00    167    167    42
qerty/azer/uuui1    20-12-20167    0:00    1    45.87    74
qerty/azer/uuui1    20-12-20167    1:00    99    167    42
qerty/azer/uuui1    20-12-20167    2:00    99    167.02    42
qerty/azer/uuui1    20-12-20167    3:00    167    167    42
qerty/azer/uuui1    20-12-20167    4:00    99    167    42
qerty/azer/uuui1    20-12-20167    5:00    167    167    42

这是我的代码:

use strict;
use warnings;

while ( <> ) {
    chomp;
    my @lines = split /\t/;
    my $name;

    next unless ( /^([a-zA-Z0-9\_\-]+)\/([a-zA-Z0-9\_\-]+)\/([a-zA-Z0-9\_\-]+)\s+([0-9\_\-]+)\s+/ );

    $name = $3;

    print join( ",",
        $name = $1, $lines[0], $lines[1], $lines[2],
        $lines[3], $lines[4], $lines[5], $lines[6] )
            . "\n";
}

有人可以帮我解决这个问题吗?

1 个答案:

答案 0 :(得分:1)

您需要记住最后一整行数据。由于您只需要记住每列的单个值,因此这是一个解决方案,每个值保存一个值。根据您的需要,您可能只需要保存前两列的值,因此两个标量可以做同样的事情,但我不是在这里做出这样的假设:

use strict;
use warnings;

my @last_full_line;
while (<DATA>)
{
    chomp;
    my @fields = split /\t/;

    # save off the old values for future use.
    for my $i (0..$#fields)
    {
        $last_full_line[$i] = $fields[$i] if length $fields[$i];
        $fields[$i] = $last_full_line[$i] unless length $fields[$i];
    }

    print join(',', @fields), "\n";
}


__END__
abcd/xyz/deft3  19-12-20167 9:00    1   45.87   74
                10:00   99  167 42
                11:00   99  167.02  42
                12:00   167 167 42
                13:00   99  167 42
                14:00   167 167 42
        20-12-20167 0:00    1   45.87   74
                1:00    99  167 42
                2:00    99  167.02  42
                3:00    167 167 42
                4:00    99  167 42
                5:00    167 167 42
qerty/azer/uuui1    19-12-20167 0:00    1   45.87   74
                1:00    99  167 42
                2:00    99  167.02  42
                3:00    167 167 42
                4:00    99  167 42
                5:00    167 167 42
        20-12-20167 0:00    1   45.87   74
                1:00    99  167 42
                2:00    99  167.02  42
                3:00    167 167 42
                4:00    99  167 42
        5:00    167 167 42