Question

我正在寻找一种命令行方法（在SunOS上），以从包含特定字符串的日志文件xml消息中提取消息。

例如，日志文件可能包含以下格式的xml消息：

<message>
    <body>
        <tags> uniqueId="123456" </tags>
    </body>
</message>

其他带时间戳的对数线旁边。可能有几条包含与同一记录相同的ID的xml消息可能已经运行了几次。

要提取当前的xml，请使用以下awk命令：

nawk '$0~s{for(c=NR-b;c<=NR+a;c++)r[c]=1}{q[NR]=$0}END{for(c=1;c<=NR;c++)if(r[c])print q[c]}' b=4 a=15 s="someUniqueId" file

我的问题是，这会拉出特定数量的行。但是，xml的长度可能会有所不同，我正在努力寻找一种修改方法，以便它找到唯一的ID，并将所有行向上拉到"<message>"，然后将所有行拉到"</message>"

Answer 1

这可能在理想世界中可行（如果我理解您的问题是对的）：

$ cat file
<message>
    <body>
        <tags> uniqueId="123455" </tags>
    </body>
</message>
<message>
    <body>
        <tags> uniqueId="123456" </tags>      # the one we want
    </body>
</message>
<message>
    <body>
        <tags> uniqueId="123457" </tags>
    </body>
</message>

awk：

$ awk '
{ 
    b=b ORS $0                            # buffer records
}
/<message>/ {                             
    b=$0                                  # reset buffer
} 
/<\/message>/ && b~/uniqueId="123456"/ {  # if condition met at the end marker
    print b                               # output buffer
}' file

输出：

<message>
    <body>
        <tags> uniqueId="123456" </tags>      # the one we wanted
    </body>
</message>

Answer 2

您也可以尝试Perl，

perl -0777 -ne ' while( m{(<message>(.+?)</message>)}sg ) 
     { $x=$1; if($x=~/uniqueId="123456"/) { print "$1\n" }} ' edman.txt

使用@James的输入，

$ cat edman.txt
<message>
    <body>
        <tags> uniqueId="123455" </tags>
    </body>
</message>
<message>
    <body>
        <tags> uniqueId="123456" </tags>      # the one we want
    </body>
</message>
<message>
    <body>
        <tags> uniqueId="123457" </tags>
    </body>
</message>

$ perl -0777 -ne ' while( m{(<message>(.+?)</message>)}sg ) 
    { $x=$1; if($x=~/uniqueId="123456"/) { print "$x\n" }} ' edman.txt
<message>
    <body>
        <tags> uniqueId="123456" </tags>      # the one we want
    </body>
</message>
$

从日志文件中的两个字符串之间拉出行，中间有第三个字符串

2 个答案: