我很难以一致的格式放置数组列。我有以下输出:
Mon,Jun,25,14:39:29,2012,971,29,0,25,0,0,0,4,Mon,Jun,25,14:39:29,2012,25,mod_was_ap22_http.c
Mon,Jun,25,14:40:29,2012,972,28,0,25,0,0,0,3,Mon,Jun,25,14:40:29,2012,3,mod_sm22.cpp,22,mod_was_ap22_http.c
Mon,Jun,25,14:41:29,2012,973,27,0,24,0,0,0,3,Mon,Jun,25,14:41:29,2012,24,mod_was_ap22_http.c
Mon,Jun,25,14:42:29,2012,974,26,0,20,0,0,0,6,Mon,Jun,25,14:42:29,2012,1,mod_sm22.cpp,19,mod_was_ap22_http.c
Mon,Jun,25,14:43:29,2012,971,29,0,26,0,0,0,3,Mon,Jun,25,14:43:29,2012,2,mod_sm22.cpp,24,mod_was_ap22_http.c
Mon,Jun,25,14:44:30,2012,957,43,0,41,0,0,0,2,Mon,Jun,25,14:44:30,2012,1,mod_sm22.cpp,40,mod_was_ap22_http.c
Mon,Jun,25,14:45:30,2012,963,37,0,35,0,0,0,2,Mon,Jun,25,14:45:30,2012,2,mod_sm22.cpp,32,mod_was_ap22_http.c
Mon,Jun,25,14:46:30,2012,972,28,0,24,1,1,0,2,Mon,Jun,25,14:46:30,2012,24,mod_was_ap22_http.c,1,ApacheModule.cpp
Mon,Jun,25,14:47:30,2012,961,39,1,37,0,0,0,1,Mon,Jun,25,14:47:30,2012,37,mod_was_ap22_http.c,1,ApacheModule.cpp
Mon,Jun,25,14:48:30,2012,968,32,0,30,0,0,0,2,Mon,Jun,25,14:48:30,2012,30,mod_was_ap22_http.c
Mon,Jun,25,14:49:30,2012,972,28,0,25,0,0,0,3,Mon,Jun,25,14:49:30,2012,1,mod_sm22.cpp,24,mod_was_ap22_http.c
我想要显示的列: DAYOFWEEK,月,日,时,年,RDY,BSY,RD,WR,嘉,日志,DNS,CLS,请 AP22,SM22,ApacheModule
目前粗体列不是那个顺序(其余的都是正确的)。每行与该格式不一致。该行有时首先是ap22,有时首先是sm22,有时没有或全部是三个模块。模块前面的数字与模块有关。如何将数据移动到一致的格式?
请注意,每行中的第二个日期mod_was_http.c,mod_sm22.cpp和ApacheModule.cpp将在最终数组中删除。
到目前为止,这是我的代码:
# This program parses a error log for necessary information and outputs in CSV format.
# chunks of your input to ignore, see below...
my %ignorables = map { $_ => 1 } qw([notice mpmstats: rdy bsy rd wr ka log dns cls bsy: in);
# 3-arg open is safer than 2, lexical my $fh better than a global FH glob
open my $error_fh, '<', 'iset_error_log';
sub findLines {
my($item,@result)=("");
# Iterates over the lines in the file, putting each into $_
while (<$error_fh>) {
# Select only those fields that have the word 'notice'
if (/\[notice/) {
# Place those lines with the word 'rdy' on the next line
if (/\brdy\b/){
push @result,"$item\n";
$item="";
}
else {
$item.=",";
}
# Split the line into fields, separated by spaces, skip the %ignorables
my @line = grep { not defined $ignorables{$_} } split /\s+/;
# More cleanup
s/|^\[|notice|[]]//g for @line; # remove unnecessary elements from the array
# Output the line.
@line = join(",", @line);
s/,,/,/g for @line;
map $item.=$_, @line;
}
}
@result
}
my @array = &findLines;
foreach $line (@array){
print $line; #This is where I would like to organize the lines if possible.
}
我的输入文件如下所示:
[Mon Jun 25 07:51:17 2012] [notice] mpmstats: rdy 990 bsy 10 rd 0 wr 7 ka 0 log 0 dns 0 cls 3
[Mon Jun 25 07:51:17 2012] [notice] mpmstats: bsy: 2 in mod_sm22.cpp, 5 in mod_was_ap22_http.c
[Mon Jun 25 08:08:17 2012] [notice] mpmstats: rdy 974 bsy 26 rd 1 wr 24 ka 0 log 0 dns 0 cls 1
[Mon Jun 25 08:08:17 2012] [notice] mpmstats: bsy: 1 in mod_sm22.cpp, 23 in mod_was_ap22_http.c, 1 in ApacheModule.cpp Mon,Jun,25,14:38:29,2012,962,38,0,36,0,0,0,2,Mon,Jun,25,14:38:29,2012,3,mod_sm22.cpp,33,mod_was_ap22_http.c
[Mon Jun 25 21:54:41 2012] [notice] mpmstats: rdy 999 bsy 1 rd 0 wr 0 ka 0 log 0 dns 0 cls 1
[Mon Jun 25 21:55:41 2012] [notice] mpmstats: rdy 999 bsy 1 rd 0 wr 0 ka 0 log 0 dns 0 cls 1
[Mon Jun 25 21:59:41 2012] [notice] mpmstats: rdy 999 bsy 1 rd 0 wr 1 ka 0 log 0 dns 0 cls 0
[Mon Jun 25 21:59:41 2012] [notice] mpmstats: bsy: 1 in mod_was_ap22_http.c
[Mon Jun 25 22:00:41 2012] [notice] mpmstats: rdy 999 bsy 1 rd 0 wr 1 ka 0 log 0 dns 0 cls 0
[Mon Jun 25 22:00:41 2012] [notice] mpmstats: bsy: 1 in mod_was_ap22_http.c
[Mon Jun 25 22:03:41 2012] [notice] mpmstats: rdy 998 bsy 2 rd 0 wr 2 ka 0 log 0 dns 0 cls 0
[Mon Jun 25 22:03:41 2012] [notice] mpmstats: bsy: 2 in mod_was_ap22_http.c
[Mon Jun 25 22:08:42 2012] [notice] mpmstats: rdy 998 bsy 2 rd 0 wr 2 ka 0 log 0 dns 0 cls 0
[Mon Jun 25 22:08:42 2012] [notice] mpmstats: bsy: 2 in mod_was_ap22_http.c
[Mon Jun 25 22:21:42 2012] [notice] mpmstats: rdy 999 bsy 1 rd 0 wr 1 ka 0 log 0 dns 0 cls 0
[Mon Jun 25 22:21:42 2012] [notice] mpmstats: bsy: 1 in mod_was_ap22_http.c
[Mon Jun 25 22:22:42 2012] [notice] mpmstats: rdy 999 bsy 1 rd 0 wr 1 ka 0 log 0 dns 0 cls 0
[Mon Jun 25 22:22:42 2012] [notice] mpmstats: bsy: 1 in mod_was_ap22_http.c
[Mon Jun 25 22:31:42 2012] [notice] mpmstats: rdy 999 bsy 1 rd 0 wr 0 ka 0 log 0 dns 0 cls 1
[Mon Jun 25 22:32:42 2012] [notice] mpmstats: rdy 999 bsy 1 rd 0 wr 1 ka 0 log 0 dns 0 cls 0
[Mon Jun 25 22:32:42 2012] [notice] mpmstats: bsy: 1 in mod_was_ap22_http.c
[Mon Jun 25 23:06:43 2012] [notice] mpmstats: rdy 999 bsy 1 rd 0 wr 1 ka 0 log 0 dns 0 cls 0
[Mon Jun 25 23:06:43 2012] [notice] mpmstats: bsy: 1 in mod_was_ap22_http.c
答案 0 :(得分:0)
在使用连接将它们重新转换为一行文本之前,您可能希望在仍然进行拆分时对列重新排序。
你只需要进行交换。
# 0 ,1 ,2 ,3 ,4 ,5 ,6 ,7 ,8 ,9 ,10 ,11 ,12 ,13 ,14 ,15
# DayOfWeek,Month,Day,Time,Year,Rdy,Bsy,Rd,Wr,Ka,Log,Dns,Cls,AP22,SM22,ApacheModule
#
# Sometimes the last 2 fields are missing and 13 comes before 14 and 15 in the
# input, so fix that.
if (@line < 16) {
push @line, '', ''; # or whatever you want for blanks
}
@line = @line[0..12,14,15,13]; # rearrange the array
此外,如果使用空字符串(s/,,/,/g
)作为空字段,则正则表达式''
将打破此问题。缺少最后一个字段的短线将重新缺少错过正确的13和14字段。
根据此处和之前提出的问题类型,我强烈建议您获取Modern Perl(可供下载或购买)或Learning Perl的副本,以便更好地掌握语言整个。我最近阅读了很多前者,并且很喜欢它,并从后者的早期版本中获得了我最初的Perl知识。