Question

我想编写一个高效的awk脚本，该脚本将采用类似于下面所示的摘录的文件，并从每个匹配的记录中打印某一行（例如，以“Time（UTC）：”开头的行）。我相信有一种比我过去做的更好的方法来做到这一点。

示例文件（抱歉，我不知道如何在代码框中添加空行。它们由“BLANK LINE”代表）：

Processor: Some_Proc
Capsule abortion no 32
Time (UTC): Fri Jun 15 06:25:10 2012
CapsuleId: 1704167
CapsuleName: SomeAppProc
Reason: Assertion "Reason1"  
BLANK LINE
Processor: Some_Proc
Capsule abortion no 33
Time (UTC): Fri Jun 15 06:25:10 2012
CapsuleId: 1704168
CapsuleName: SomeAppProc
Reason: Assertion "Reason2"  
BLANK LINE
Processor: Some_Proc
Capsule abortion no 34
Time (UTC): Fri Jun 15 06:25:10 2012
CapsuleId: 1704168
CapsuleName: SomeAppProc
Reason: Assertion "Reason1"

上一个代码示例（抱歉，我不知道如何在此论坛中保留缩进，我尝试了8个空格，但是没有用）

BEGIN {
    RS=""  #Each record is a "paragraph"
    FS="\n" #Each field is a line
}

/Reason1/ {
    # print $3  would work if it always shows up on the third line
    # but the following for loop should find it if it's on a different line
    for (i=1;i<=NF;i++) {
        if ($i ~ /^Time.*/) {
            print $i
            next
        }
    }
}

如果不总是以相同的顺序出现，是否有更有效的方式来打印该行？

由于

Answer 1

对我来说这似乎是一个很好的解决方案。我会用同样的方法来解决这个问题。我会使用break而不是next，因为你想在找到该行后停止循环。 next指令几乎没有意义，因为它执行循环的下一个循环，如果它不存在则相同。

for (i=1;i<=NF;i++) {
    if ($i ~ /^Time.*/) {
        print $i
        break
    }
}

Answer 2

这样的事情怎么样？：

BEGIN { reset(); }
END { reset(); }
$0 == "" { reset(); }
/^Reason:/ && $3 == "\"Reason1\"" { found = 1; }
/^Time \(UTC\):/ { time = $0; }

function reset() {
  if (found) { print time; }
  found = 0;
  time = "(unknown)";
}

然后只使用换行的默认记录分隔符。这样做会记录时间和原因字段，因为它们被读取，然后打印出每个匹配记录结束时的时间。

Awk有效地打印匹配段落中的匹配行

2 个答案: