用于处理文本文件的Perl脚本

时间:2015-12-14 15:20:41

标签: perl file-handling text-processing

我有一个以下格式的文件理想

Status_ArsFlag = ""
Status_NodeAlias = ""
OID1 = ".1.3.6.1.4.1.11.2.17.19.2.2.1"
1 = "NNMi"
2 = "ASB"
3 = "456"
RawCaptureTimeStamp = 1450091580


Status_Identifier = ""
Status_NodeAlias = ""
OID1 = ".1.3.6.1.4.1.11.2.17.19.2.2.1"
1 = "NNMi"
2 = "ASB"
3 = "456"
RawCaptureTimeStamp = 1450091580


Status_Node = ""
Status_NodeAlias = ""
OID1 = ".1.3.6.1.4.1.11.2.17.19.2.2.1"
1 = "NNMi"
2 = "ASB"
3 = "456"
RawCaptureTimeStamp = 1450091580

理想情况下,每个数据集以“状态_ ”行开头,以“ RawCaptureTimeStamp ”结尾,以2个新行分隔。

现在问题出现在非理想情况下,文件可能如下所示:

1 = "NNMi"
2 = "ASB"
3 = "456"
RawCaptureTimeStamp = 1450091580


Status_ArsFlag = ""
Status_NodeAlias = ""
OID1 = ".1.3.6.1.4.1.11.2.17.19.2.2.1"
1 = "NNMi"
2 = "ASB"
3 = "456"
RawCaptureTimeStamp = 1450091580


Status_ArsFlag = ""
Status_NodeAlias = ""
OID1 = ".1.3.6.1.4.1.11.2.17.19.2.2.1"
1 = "NNMi"

如上所示,第一个和最后一个数据集无效。我需要一个逻辑,我可以从原始文件中删除这些不需要的数据集并重新保存。 我在PERL中尝试了几件事,但都失败了。请帮忙。 我用来读取文件的代码,检查文件是否以状态开头,如果没有读取,直到达到rawcapturetimestamp。

while( my $line = <$cap_1>){
    if($. == 1 && $line !~ /^Status/){ #check if first line doesn't begin with status
            while($line = <$cap_1>){#if not read till the occurence of RawCaptureTimeStamp
            if($line =~/^RawCaptureTimeStamp/){
                $. = $.+1;
                last;
            }
        }
        $line = <$cap_1>; 
        if (eof()){ #After reading till raw capture timestamp, check for EOF
            last;
        }
    }
}

4 个答案:

答案 0 :(得分:2)

我只是以段落模式阅读文件(将$/设为""而不是"\n\n"Jonathan Leffler commented就你的问题而言) 并检查每个段落的一致性

必须在每个块的末尾替换三个换行符,因为PerlIO在此模式下将它们标准化为两个

看起来问题是数据可能会在两端被截断,所以我需要10个数字作为时间戳,其中包括2001年到2286年的日期

use strict;
use warnings 'all';

local $/ = ''; # Separate reads by one or more blank lines

while ( <> ) {

    next unless /^Status.+\nStatus/ and /^RawCaptureTimeStamp = \d{10}/m;
    s/\s*\z/\n\n\n/;

    print;
}

输出(使用错误的示例数据集)

Status_ArsFlag = ""
Status_NodeAlias = ""
OID1 = ".1.3.6.1.4.1.11.2.17.19.2.2.1"
1 = "NNMi"
2 = "ASB"
3 = "456"
RawCaptureTimeStamp = 1450091580

答案 1 :(得分:0)

#! /usr/bin/perl
use warnings;
use strict;

$_ = q();
$_ = <> until /^Status_/; # Skip the invalid beginning;

my $block = $_;

while (<>) {
    if (/^RawCaptureTimeStamp/) {  # End of block: print it, start gathering a new one.
        print $block, $_;
        $block = q();

    } else {                       # Inside of a block.
        $block .= $_;
    }
}

如果未正确结束,则不会打印最后一个块。

答案 2 :(得分:0)

这相信,我相信:

#!/usr/bin/env perl
use strict;
use warnings;

$/ = "\n\n";

while (<>)
{
    s/^\s+//;
    s/\s+$//;
    print "\n[[", $_, "]]\n"
        if (m/^Status_\w+ .*Status_\w+ /ms && m/^RawCaptureTimeStamp /m);
}

设置$/会读取双换行符(或EOF),有效地一次读取一个段落。 if条件会查找两个Status_元素和一个RawCaptureTimeStamp;你可以根据需要改进这些条件,使它们更加严格。 s修饰符允许.*匹配嵌入的换行符; m修饰符用于多行模式。例如,RawCaptureTimeStamp后跟其他行就可以了。

从问题中复制的示例数据:

Status_ArsFlag = ""
Status_NodeAlias = ""
OID1 = ".1.3.6.1.4.1.11.2.17.19.2.2.1"
1 = "NNMi"
2 = "ASB"
3 = "456"
RawCaptureTimeStamp = 1450091580


Status_Identifier = ""
Status_NodeAlias = ""
OID1 = ".1.3.6.1.4.1.11.2.17.19.2.2.1"
1 = "NNMi"
2 = "ASB"
3 = "456"
RawCaptureTimeStamp = 1450091580


Status_Node = ""
Status_NodeAlias = ""
OID1 = ".1.3.6.1.4.1.11.2.17.19.2.2.1"
1 = "NNMi"
2 = "ASB"
3 = "456"
RawCaptureTimeStamp = 1450091580


1 = "NNMi"
2 = "ASB"
3 = "456"
RawCaptureTimeStamp = 1450091580


Status_ArsFlag = ""
Status_NodeAlias = ""
OID1 = ".1.3.6.1.4.1.11.2.17.19.2.2.1"
1 = "NNMi"
2 = "ASB"
3 = "456"
RawCaptureTimeStamp = 1450091580


Status_ArsFlag = ""
Status_NodeAlias = ""
OID1 = ".1.3.6.1.4.1.11.2.17.19.2.2.1"
1 = "NNMi"

示例输出:

[[Status_ArsFlag = ""
Status_NodeAlias = ""
OID1 = ".1.3.6.1.4.1.11.2.17.19.2.2.1"
1 = "NNMi"
2 = "ASB"
3 = "456"
RawCaptureTimeStamp = 1450091580]]

[[Status_Identifier = ""
Status_NodeAlias = ""
OID1 = ".1.3.6.1.4.1.11.2.17.19.2.2.1"
1 = "NNMi"
2 = "ASB"
3 = "456"
RawCaptureTimeStamp = 1450091580]]

[[Status_Node = ""
Status_NodeAlias = ""
OID1 = ".1.3.6.1.4.1.11.2.17.19.2.2.1"
1 = "NNMi"
2 = "ASB"
3 = "456"
RawCaptureTimeStamp = 1450091580]]

[[Status_ArsFlag = ""
Status_NodeAlias = ""
OID1 = ".1.3.6.1.4.1.11.2.17.19.2.2.1"
1 = "NNMi"
2 = "ASB"
3 = "456"
RawCaptureTimeStamp = 1450091580]]

答案 3 :(得分:0)

使用Perl 段落模式,如上所述here

#!/usr/bin/perl -w

use strict;

local $/ = "";

while (my $para = <DATA>) {
    print $para if ($para =~ /^Status_.*RawCaptureTimeStamp/s);
}

__DATA__
1 = "NNMi"
2 = "ASB"
3 = "456"
RawCaptureTimeStamp = 1450091580


Status_ArsFlag = ""
Status_NodeAlias = ""
OID1 = ".1.3.6.1.4.1.11.2.17.19.2.2.1"
1 = "NNMi"
2 = "ASB"
3 = "456"
RawCaptureTimeStamp = 1450091580


Status_ArsFlag = ""
Status_NodeAlias = ""
OID1 = ".1.3.6.1.4.1.11.2.17.19.2.2.1"
1 = "NNMi"