Awk或Sed:XML多线之间的grep数据

时间:2015-05-17 08:40:11

标签: regex xml awk sed

我试图通过Awk或Sed尝试grep并显示来自XML文件的数据,但是陷入了僵局......

详细信息我要了解如何执行以下操作:(1)从' mt'中获取价值。标签,(2)分析所有' moid'包含' Source = _SYSTEM'的标签只有,(3)获得' Host ='的价值并在下一行获得' r'标签,(4)然后打印来自' mt'标签,(5)然后打印'主机='并且打印了' r' tag,(6)来自所有' Host ='的总和值并打印出来;

这里的问题是我在XML中有很多标签和许多行。

这是我要解析的XML,

<?xml version="1.0"?>
<neid>
<neun></neun>
<nedn>element=home</nedn>
</neid>
<mi>
    <mts>20150517032500.0+0200</mts>
    <gp>300</gp>
    <mt>Name1</mt>
    <mv>
        <moid>Host=super1.stackoverflow.com, Source = Source1</moid>
        <r>1</r>
    </mv>
    <mv>
        <moid>Host=super2.stackoverflow.com, Source = Source2</moid>
        <r>1</r>
    </mv>
    <mv>
        <moid>Host=super2.stackoverflow.com, Source = _SYSTEM</moid>
        <r>2</r>
    </mv>
    <mv>
        <moid>Host=super2.stackoverflow.com, Source = Source3</moid>
        <r>1</r>
    </mv>
    <mv>
        <moid>Host=super1.stackoverflow.com, Source = _SYSTEM</moid>
        <r>2</r>
    </mv>
    <mv>
        <moid>Host=super1.stackoverflow.com, Source = Source4</moid>
        <r>1</r>
    </mv>
</mi>
<mi>
    <mts>20150517032500.0+0200</mts>
    <gp>300</gp>
    <mt>Name2</mt>
    <mv>
        <moid>Host=super1.stackoverflow.com, Source = Source1</moid>
        <r>11</r>
    </mv>
    <mv>
        <moid>Host=super2.stackoverflow.com, Source = Source2</moid>
        <r>11</r>
    </mv>
    <mv>
        <moid>Host=super2.stackoverflow.com, Source = _SYSTEM</moid>
        <r>22</r>
    </mv>
    <mv>
        <moid>Host=super2.stackoverflow.com, Source = Source3</moid>
        <r>11</r>
    </mv>
    <mv>
        <moid>Host=super1.stackoverflow.com, Source = _SYSTEM</moid>
        <r>22</r>
    </mv>
    <mv>
        <moid>Host=super1.stackoverflow.com, Source = Source4</moid>
        <r>11</r>
    </mv>
</mi>

预期结果,

Name1:
   super1.stackoverflow.com: 2
   super2.stackoverflow.com: 2
   TOTAL: 4

Name2:
   super1.stackoverflow.com: 22
   super2.stackoverflow.com: 22
   TOTAL: 44

UPD:我的要求是使用Awk或Sed,因为不幸的是(不允许在主机上安装它)使用xmllint或xmlstarlet或类似的东西。

提前多多谢谢你!

1 个答案:

答案 0 :(得分:0)

假设文件结构与上面提到的完全相同,并且不会改变下面应该做的诀窍

  

sed -n -e 's/ *<mt>\(.*\)<\/mt>/\1:/p;/<moid>..*Source = _SYSTEM/{N;s/\n//g;s/.*Host=\(.*\), Source = _SYSTEM.*<r>\(.*\)<\/r>/\1:\2/p}' file.txt|awk -F":" -v x=0 '{if(NR==1){print $0;next};if($2==""){print "TOTAL:" x "\n" $0;x=0;} else {x=x+$2;print $0;}}END{print "TOTAL:" x}'

     

除了<mt <moid> or <r>标签中存在的内容之外,Sed正在剥离其他所有内容。然后Awk处理生成的文件并提供总和和TOTAL值