我的数据如下所示,实际文件长达数千行。
Event_time Cease_time
Object_of_reference
-------------------------- --------------------------
----------------------------------------------------------------------------------
Apr 5 2010 5:54PM NULL
SubNetwork=ONRM_RootMo,SubNetwork=AXE,ManagedElement=BSJN1,BssFunction=
BSS_ManagedFunction,BtsSiteMgr=LUGALAMBO_900
Apr 5 2010 5:55PM Apr 5 2010 6:43PM
SubNetwork=ONRM_RootMo,SubNetwork=AXE,ManagedElement=BSJN1,BssFunction=
BSS_ManagedFunction,BtsSiteMgr=LUGALAMBO_900
Apr 5 2010 5:58PM NULL
SubNetwork=ONRM_RootMo,SubNetwork=AXE,ManagedElement=BSCC1,BssFunction=
BSS_ManagedFunction,BtsSiteMgr=BULAGA
Apr 5 2010 5:58PM Apr 5 2010 6:01PM
SubNetwork=ONRM_RootMo,SubNetwork=AXE,ManagedElement=BSCC1,BssFunction=
BSS_ManagedFunction,BtsSiteMgr=BULAGA
Apr 5 2010 6:01PM NULL
SubNetwork=ONRM_RootMo,SubNetwork=AXE,ManagedElement=BSCC1,BssFunction=
BSS_ManagedFunction,BtsSiteMgr=BULAGA
Apr 5 2010 6:03PM NULL
SubNetwork=ONRM_RootMo,SubNetwork=AXE,ManagedElement=BSJN1,BssFunction=
BSS_ManagedFunction,BtsSiteMgr=KAPKWAI_900
Apr 5 2010 6:03PM Apr 5 2010 6:04PM
SubNetwork=ONRM_RootMo,SubNetwork=AXE,ManagedElement=BSJN1,BssFunction=
BSS_ManagedFunction,BtsSiteMgr=KAPKWAI_900
Apr 5 2010 6:04PM NULL
SubNetwork=ONRM_RootMo,SubNetwork=AXE,ManagedElement=BSJN1,BssFunction=
BSS_ManagedFunction,BtsSiteMgr=KAPKWAI_900
Apr 5 2010 6:03PM Apr 5 2010 6:03PM
SubNetwork=ONRM_RootMo,SubNetwork=AXE,ManagedElement=BSCC1,BssFunction=
BSS_ManagedFunction,BtsSiteMgr=BULAGA
Apr 5 2010 6:03PM NULL
SubNetwork=ONRM_RootMo,SubNetwork=AXE,ManagedElement=BSCC1,BssFunction=
BSS_ManagedFunction,BtsSiteMgr=BULAGA
Apr 5 2010 6:03PM Apr 5 2010 7:01PM
SubNetwork=ONRM_RootMo,SubNetwork=AXE,ManagedElement=BSCC1,BssFunction=
BSS_ManagedFunction,BtsSiteMgr=BULAGA
如您所见,每个文件都有一个标题,用于描述各个字段的含义(事件开始时间,事件停止时间,受影响的元素)。标题之后是许多破折号。
我的问题是,在数据中,您会看到许多条目,其中停止时间为NULL,即事件仍处于活动状态。所有这些条目必须为每个警报停止时间为NULL的元素,开始时间,停止时间(在这种情况下为NULL)以及必须从文件中删除实际元素。
在剩余的数据中,从单词SubNetwork到BtsSiteMgr =开头的所有文本也必须去。连同标题和破折号。
最终输出应如下所示:
Apr 5 2010 5:55PM Apr 5 2010 6:43PM
LUGALAMBO_900
Apr 5 2010 5:58PM Apr 5 2010 6:01PM
BULAGA
Apr 5 2010 6:03PM Apr 5 2010 6:04PM
KAPKWAI_900
Apr 5 2010 6:03PM Apr 5 2010 6:03PM
BULAGA
Apr 5 2010 6:03PM Apr 5 2010 7:01PM
BULAGA
下面是我编写的Perl脚本。它已经处理了标题,破折号,NULL条目,但是我无法删除NULL条目后面的行,以便产生上述输出。
#!/usr/bin/perl
use strict;
use warnings;
$^I=".bak" #Backup the file before messing it up.
open (DATAIN,"<george_perl.txt")|| die("can't open datafile: $!"); # Read in the data
open (DATAOUT,">gen_results.txt")|| die("can't open datafile: $!"); #Prepare for the writing
while (<DATAIN>) {
s/Event_time//g;
s/Cease_time//g;
s/Object_of_reference//g;
s/\-//g; #Preceding 4 statements are for cleaning out the headers
my $theline=$_;
if ($theline =~ /NULL/){
next;
next if $theline =~ /SubN/;
}
else{
print DATAOUT $theline;
}
}
close DATAIN;
close DATAOUT;
请帮助指出我需要对脚本进行的任何修改,以使其产生必要的输出。
答案 0 :(得分:1)
看起来像一个小的输入记录分隔符($/
)技巧的好候选人。我们的想法是操纵它,使其一次处理一条记录,而不是默认的单行。
use strict;
use warnings;
$^I = '.bak';
open my $dataIn, '<', 'george_perl.txt' or die "Can't open data file: $!";
open my $dataOut, '>', 'gen_results.txt' or die "Can't open output file: $!";
{
local $/ = "\n\t"; # Records have leading tabs
while ( my $record = <$dataIn> ) {
# Skip header & records that contain 'NULL'
next if $record =~ /NULL|Event_time/;
# Strip out the unwanted yik-yak
$record =~ s/SubNetwork.*BtsSiteMgr=//s;
# Print record to output file
print $dataOut $record;
}
}
close $dataIn;
close $dataOut;
请注意以下事项:
open
(双参数形式就是你所展示的)local
关键字和额外的curlies来修改$/
的定义。s
中的第二个s/SubNetwork.*BtsSitMgr=//s
也允许多行匹配。答案 1 :(得分:1)
您的数据以3行的形式到达,因此一种方法是以这种方式组织解析:
use strict;
use warnings;
# Ignore header junk.
while (<>){
last unless /\S/;
}
until (eof) {
# Read in a set of 3 lines.
my @lines;
push @lines, scalar <> for 1 .. 3;
# Filter and clean.
next if $lines[0] =~ /\sNULL\s/;
$lines[2] =~ s/.+BtsSiteMgr=//;
print @lines[0,2];
}
答案 2 :(得分:0)
s/^.*NULL\r?\n.*\r?\n.*\r?\n//mg;
应过滤掉以NULL
结尾的行以及以下两行。