我有一个以下格式的文件理想:
Status_ArsFlag = ""
Status_NodeAlias = ""
OID1 = ".1.3.6.1.4.1.11.2.17.19.2.2.1"
1 = "NNMi"
2 = "ASB"
3 = "456"
RawCaptureTimeStamp = 1450091580
Status_Identifier = ""
Status_NodeAlias = ""
OID1 = ".1.3.6.1.4.1.11.2.17.19.2.2.1"
1 = "NNMi"
2 = "ASB"
3 = "456"
RawCaptureTimeStamp = 1450091580
Status_Node = ""
Status_NodeAlias = ""
OID1 = ".1.3.6.1.4.1.11.2.17.19.2.2.1"
1 = "NNMi"
2 = "ASB"
3 = "456"
RawCaptureTimeStamp = 1450091580
理想情况下,每个数据集以“状态_ ”行开头,以“ RawCaptureTimeStamp ”结尾,以2个新行分隔。
现在问题出现在非理想情况下,文件可能如下所示:
1 = "NNMi"
2 = "ASB"
3 = "456"
RawCaptureTimeStamp = 1450091580
Status_ArsFlag = ""
Status_NodeAlias = ""
OID1 = ".1.3.6.1.4.1.11.2.17.19.2.2.1"
1 = "NNMi"
2 = "ASB"
3 = "456"
RawCaptureTimeStamp = 1450091580
Status_ArsFlag = ""
Status_NodeAlias = ""
OID1 = ".1.3.6.1.4.1.11.2.17.19.2.2.1"
1 = "NNMi"
如上所示,第一个和最后一个数据集无效。我需要一个逻辑,我可以从原始文件中删除这些不需要的数据集并重新保存。 我在PERL中尝试了几件事,但都失败了。请帮忙。 我用来读取文件的代码,检查文件是否以状态开头,如果没有读取,直到达到rawcapturetimestamp。
while( my $line = <$cap_1>){
if($. == 1 && $line !~ /^Status/){ #check if first line doesn't begin with status
while($line = <$cap_1>){#if not read till the occurence of RawCaptureTimeStamp
if($line =~/^RawCaptureTimeStamp/){
$. = $.+1;
last;
}
}
$line = <$cap_1>;
if (eof()){ #After reading till raw capture timestamp, check for EOF
last;
}
}
}
答案 0 :(得分:2)
我只是以段落模式阅读文件(将$/
设为""
而不是"\n\n"
为
Jonathan Leffler commented就你的问题而言)
并检查每个段落的一致性
必须在每个块的末尾替换三个换行符,因为PerlIO在此模式下将它们标准化为两个
看起来问题是数据可能会在两端被截断,所以我需要10个数字作为时间戳,其中包括2001年到2286年的日期
use strict;
use warnings 'all';
local $/ = ''; # Separate reads by one or more blank lines
while ( <> ) {
next unless /^Status.+\nStatus/ and /^RawCaptureTimeStamp = \d{10}/m;
s/\s*\z/\n\n\n/;
print;
}
输出(使用错误的示例数据集)
Status_ArsFlag = ""
Status_NodeAlias = ""
OID1 = ".1.3.6.1.4.1.11.2.17.19.2.2.1"
1 = "NNMi"
2 = "ASB"
3 = "456"
RawCaptureTimeStamp = 1450091580
答案 1 :(得分:0)
#! /usr/bin/perl
use warnings;
use strict;
$_ = q();
$_ = <> until /^Status_/; # Skip the invalid beginning;
my $block = $_;
while (<>) {
if (/^RawCaptureTimeStamp/) { # End of block: print it, start gathering a new one.
print $block, $_;
$block = q();
} else { # Inside of a block.
$block .= $_;
}
}
如果未正确结束,则不会打印最后一个块。
答案 2 :(得分:0)
这相信,我相信:
#!/usr/bin/env perl
use strict;
use warnings;
$/ = "\n\n";
while (<>)
{
s/^\s+//;
s/\s+$//;
print "\n[[", $_, "]]\n"
if (m/^Status_\w+ .*Status_\w+ /ms && m/^RawCaptureTimeStamp /m);
}
设置$/
会读取双换行符(或EOF),有效地一次读取一个段落。 if
条件会查找两个Status_
元素和一个RawCaptureTimeStamp
;你可以根据需要改进这些条件,使它们更加严格。 s
修饰符允许.*
匹配嵌入的换行符; m
修饰符用于多行模式。例如,RawCaptureTimeStamp
后跟其他行就可以了。
从问题中复制的示例数据:
Status_ArsFlag = ""
Status_NodeAlias = ""
OID1 = ".1.3.6.1.4.1.11.2.17.19.2.2.1"
1 = "NNMi"
2 = "ASB"
3 = "456"
RawCaptureTimeStamp = 1450091580
Status_Identifier = ""
Status_NodeAlias = ""
OID1 = ".1.3.6.1.4.1.11.2.17.19.2.2.1"
1 = "NNMi"
2 = "ASB"
3 = "456"
RawCaptureTimeStamp = 1450091580
Status_Node = ""
Status_NodeAlias = ""
OID1 = ".1.3.6.1.4.1.11.2.17.19.2.2.1"
1 = "NNMi"
2 = "ASB"
3 = "456"
RawCaptureTimeStamp = 1450091580
1 = "NNMi"
2 = "ASB"
3 = "456"
RawCaptureTimeStamp = 1450091580
Status_ArsFlag = ""
Status_NodeAlias = ""
OID1 = ".1.3.6.1.4.1.11.2.17.19.2.2.1"
1 = "NNMi"
2 = "ASB"
3 = "456"
RawCaptureTimeStamp = 1450091580
Status_ArsFlag = ""
Status_NodeAlias = ""
OID1 = ".1.3.6.1.4.1.11.2.17.19.2.2.1"
1 = "NNMi"
示例输出:
[[Status_ArsFlag = ""
Status_NodeAlias = ""
OID1 = ".1.3.6.1.4.1.11.2.17.19.2.2.1"
1 = "NNMi"
2 = "ASB"
3 = "456"
RawCaptureTimeStamp = 1450091580]]
[[Status_Identifier = ""
Status_NodeAlias = ""
OID1 = ".1.3.6.1.4.1.11.2.17.19.2.2.1"
1 = "NNMi"
2 = "ASB"
3 = "456"
RawCaptureTimeStamp = 1450091580]]
[[Status_Node = ""
Status_NodeAlias = ""
OID1 = ".1.3.6.1.4.1.11.2.17.19.2.2.1"
1 = "NNMi"
2 = "ASB"
3 = "456"
RawCaptureTimeStamp = 1450091580]]
[[Status_ArsFlag = ""
Status_NodeAlias = ""
OID1 = ".1.3.6.1.4.1.11.2.17.19.2.2.1"
1 = "NNMi"
2 = "ASB"
3 = "456"
RawCaptureTimeStamp = 1450091580]]
答案 3 :(得分:0)
使用Perl
段落模式,如上所述here
#!/usr/bin/perl -w
use strict;
local $/ = "";
while (my $para = <DATA>) {
print $para if ($para =~ /^Status_.*RawCaptureTimeStamp/s);
}
__DATA__
1 = "NNMi"
2 = "ASB"
3 = "456"
RawCaptureTimeStamp = 1450091580
Status_ArsFlag = ""
Status_NodeAlias = ""
OID1 = ".1.3.6.1.4.1.11.2.17.19.2.2.1"
1 = "NNMi"
2 = "ASB"
3 = "456"
RawCaptureTimeStamp = 1450091580
Status_ArsFlag = ""
Status_NodeAlias = ""
OID1 = ".1.3.6.1.4.1.11.2.17.19.2.2.1"
1 = "NNMi"