awk命令根据条件在两个多行模式之间grep数据

时间:2013-02-04 08:02:54

标签: regex unix sed awk grep

Sample.xml:

`         测试点                 id 1的lvl3        lvl4的id 1              

<tester>
   <level1 id="2"> test point </level1>
   <level2> </level2>
   <level3>lvl3 of id 2 </level3>
   <level4> lvl4 of id 2</level4>
   <level5> </level5>
</tester>

<tester>
   <level1 id="3"> test point </level1>
   <level2> </level2>
   <level3>lvl3 of id 3</level3>
   <level4>lvl4 of id 3</level4>
   <level5> </level5>
</tester>

<tester>
   <level1 id="2"> test point </level1>
   <level2> </level2>
   <level3>lvl3 of id 2 2nd occurance</level3>
   <level4>lvl4 of id 2 2nd occurance</level4>
   <level5> </level5>
</tester>

`
对于上面提到的sample.xml,只有当level1中的 Id 2 时,我才需要获取level3和level4标记。 例如: 当我搜索 id = 2

时,我应该得到以下答案
<level3>lvl3 of id 2 </level3>
<level4> lvl4 of id 2</level4>

<level3>lvl3 of id 2 2nd occurance</level3>
<level4>lvl4 of id 2 2nd occurance</level4>

3 个答案:

答案 0 :(得分:2)

使用sed:

sed -n '/<tester>/{n;/<level1[ ]*id="2"/{n;n;N;p}}' input

说明:

sed                  # execute sed
-n                   # do not print unless explicitly stated
/<tester>/           # if this line contains <tester>
{                    # then 
n;                   # skip the line (read new line over the old line)
/<level1[ ]*id="2"/  # if this line contains <level1 [spaces] id="2"
{                    # then
n;n;                 # skip it, and skip the next line
N;                   # read another line but this time append
p                    # print the buffer
}                    # end if
}                    # end if

答案 1 :(得分:0)

我建议使用像xmlstarlet这样的xml解析器。但是,这并不是说使用awk无法完成。这是一种方式。像:

一样运行
awk -f script.awk file

script.awk的内容:

/<tester>/ {
    r=""
    f=1
}

f && /<level1 id="2">/ {
    g=1
}

g && /<level[34]>/ {
    sub(/^[ \t]+/, "")
    r = r $0 ORS
}

/<\/tester>/ {
    if (g && r) {
        print r
    }
    f=g=0
}

结果:

<level3>lvl3 of id 2 </level3>
<level4> lvl4 of id 2</level4>

<level3>lvl3 of id 2 2nd occurance</level3>
<level4>lvl4 of id 2 2nd occurance</level4>

或者,这是单行:

awk '/<tester>/ { r=""; f=1 } f && /<level1 id="2">/ { g=1 } g && /<level[34]>/ { sub(/^[ \t]+/, ""); r = r $0 ORS } /<\/tester>/ { if (g && r) print r; f=g=0 }' file

答案 2 :(得分:0)

在awk中使用块时,清除RS通常很方便。我相信这可以做你想要的:

awk '/id="2"/{print ""; split( $0,a,"\n" ); for( i in a) 
    if( match( a[i], "level[34]" )) print(a[i])}' RS= input