假设我有一个大型日志文件,类似于:
[2016-11-11 16:41:06.062] <sid:111> start1
[2016-11-11 16:41:06.062] <sid:111> op <555>
[2016-11-11 16:41:06.063] <sid:111> op <666>
[2016-11-11 16:41:07.124] <sid:222> start1
[2016-11-11 16:41:07.125] <sid:111> end
[2016-11-11 16:41:07.123] <sid:222> op <777>
[2016-11-11 16:41:08.333] <sid:333> start2
[2016-11-11 16:41:08.352] <sid:333> op <888>
[2016-11-11 16:41:08.352] <sid:333> op <999>
[2016-11-11 16:41:09.062] <sid:333> end
[2016-11-11 16:41:09.100] <sid:222> op <222>
[2016-11-11 16:41:09.100] <sid:222> op <333>
[2016-11-11 16:41:09.100] <sid:222> end
假设我需要知道以operation
开头的每个会话的start1
号码:
<sid:111> <555>
<sid:111> <666>
<sid:222> <777>
<sid:222> <222>
<sid:222> <333>
如果并发会话并行执行并且并非所有行都在一起,那么如何使用awk
(或其他任何内容)执行此操作。
我已尝试使用以下awk
脚本:
awk '
BEGIN {
seen_start = 0;
seen_end = 1;
}
!seen_start && seen_end && $0 ~ /start1/ {
match($0, "(<sid:[a-f0-9]+>) start1", m);
sid = m[1];
seen_start = 1;
seen_end = 0;
}
seen_start && !seen_end && $0 ~ sid && $0 ~ /op/ {
match($0, "op (<[0-9]+>)", m);
print sid, m[1];
}
seen_start && !seen_end && $0 ~ sid && $0 ~ /end/ {
seen_start = 0;
seen_end = 1;
}
' test
但是,我错过了从另一个中间开始的会议。
<sid:111> <555>
<sid:111> <666>
由于
答案 0 :(得分:2)
awk one-liner
awk -F '[:<>]' '/start1/ {a[$5]; next} /end/ {delete a[$5]; next} /op/ && $5 in a {print $5, $7}' test
<强>解释强>
awk -F '[:<>]' ' # Split on either :, < or >
/start1/ {a[$5]; next} # Note that the session has started
/end/ {delete a[$5]; next} # note session ended
/op/ && $5 in a { # print if session has started
print $5, $7
}
'
答案 1 :(得分:0)
我最终使用了perl
#!/usr/bin/perl
my %hash;
while (<>) {
if (/(<sid:[a-f0-9]+>) start1/) {
$hash{$1} = 1;
}
elsif (/(<sid:[a-f0-9]+>) op (<[0-9]+>)/) {
if (exists $hash{$1}) {
print "$1 $2\n";
}
}
elsif (/(<sid:[a-f0-9]+>) end/) {
if (exists $hash{$1}) {
delete $hash{$1};
}
}
}