Question

我正在努力解析一些日志文件。

这是它的样子：

node_name:  na2-devdb-cssx
run_id:     3c3424f3-8a62-4f4c-b97a-2096a2afc070
start_time: 2015-06-26T21:00:44Z
status:     failure

node_name:  eu1-devsx
run_id:     f5ed13a3-1f02-490f-b518-97de9649daf5
start_time: 2015-06-26T21:00:34Z
status:     success

我需要在块的最后一行获得“失败”的块。

理想情况下也会考虑时间戳。就像时间戳就像“2015-06-26T2 *”

这就是我到目前为止所尝试的内容：

sed -e '/node_name/./failure/p' /file

sed -n '/node_name/./failure/p' /file

awk '/node_name/,/failure/' file

sed -e 's/node_name\(.*\)failure/\1/' file

它们都不适合我。除了失败之外它只会抛出一切...... 例如：

[root@localhost chef-repo-zilliant]# sed -n '/node_name/,/failure/p' /tmp/run.txt | head
node_name:  eu1-devdb-linc
run_id:     e49fe64d-567d-4627-a10d-477e17fb6016
start_time: 2015-06-28T20:59:55Z
status:     success

node_name:  eu1-devjs1
run_id:     c6c7f668-b912-4459-9d56-94d1e0788802
start_time: 2015-06-28T20:59:53Z
status:     success

不知道为什么它不起作用。似乎所有这些方法都可以正常工作......

提前谢谢。

Answer 1

Gnu sed的一种方式：

sed -n ':a;/^./{H;n;ba;};x;/2015-06-26T21/{/failure$/p;};' file.txt

细节：

:a;           # define the label "a"
/^./ {        # condition: when a line is not empty
    H;        # append it to the buffer space
    n;        # load the next line in the pattern space
    ba;       # go to label "a"
};

x;                 # swap buffer space and pattern space
/2015-06-26T21/ {  # condition: if the needed date is in the block
    /failure$/ p;  # condition: if "failure" is in the block then print
};

Answer 2

使用grep。

grep -oPz '\bnode_name:(?:(?!\n\n)[\s\S])*?2015-06-26T2(?:(?!\n\n)[\s\S])*?\bfailure\b' file

这里的主要部分是(?:(?!\n\n)[\s\S])*?，它匹配任何charactar但不是空行，零次或多次。

Answer 3

I noted you tried with awk, although you only tagged the question with sed, so I will add a solution with it.

You can play with built-in variable that control how to split lines and fields, like:

awk '
    BEGIN { RS = ""; FS = OFS = "\n"; ORS = "\n\n" } 
    $NF ~ /failure/ && $(NF-1) ~ /2015-06-26T2/ { print }
' infile

RS = "" separates records in newlines. FS and OFS separates fields in lines, and ORS is to print output like original input, with a line interleaved.

It yields:

node_name:  na2-devdb-cssx
run_id:     3c3424f3-8a62-4f4c-b97a-2096a2afc070
start_time: 2015-06-26T21:00:44Z
status:     failure

在一些确切的文本块之间的Sed

3 个答案: