如何在xml输出中迭代不同的项目,然后使用bash / linux shell命令为每个项目打印出不同的值

时间:2016-03-02 22:59:15

标签: xml bash soap

我有以下XML输出(通过使用curl创建对WSDL的SOAP调用生成):

<?xml version="1.0"?>
<env:Envelope xmlns:env="http://schemas.xmlsoap.org/soap/envelope/">
  <env:Header/>
  <env:Body>
    <ns1:getNodesResponse xmlns:ns1="http://node.sdk.nms.ov.hp.com/">
      <return>
        <item>
          <created>2013-04-22T12:48:06.676Z</created>
          <deviceCategory>com.hp.ov.nms.devices.switchrouter</deviceCategory>
          <deviceDescription>Cisco Nexus C7018 DataCenter Switch</deviceDescription>
          <deviceFamily>com.hp.ov.nms.devices.cisconexus7000seriesswitches</deviceFamily>
          <deviceModel>ciscoNexusC7018</deviceModel>
          <deviceVendor>com.hp.ov.nms.devices.cisco</deviceVendor>
          <discoveryState>DISCOVERY_COMPLETED</discoveryState>
          <systemContact>xxxxxxxxxxxxxxxxxxx</systemContact>
          <systemDescription>xxxxxxxxxxxxxxxxxxxx</systemDescription>
          <systemLocation>xxxxxxxxxxxxxxxxxx</systemLocation>
          <systemName>xxxxxxxxxxxxxxxxxxx</systemName>
          <systemObjectId>.1.3.6.1.4.1.9.12.3.1.3.777</systemObjectId>
          <uuid>c8652440-caf2-490b-8892-cb914a39d19e</uuid>
        </item>
        <item>
          <created>2013-04-22T12:49:36.750Z</created>
          <deviceCategory>com.hp.ov.nms.devices.switchrouter</deviceCategory>
          <deviceDescription>Cisco Nexus C7018 DataCenter Switch</deviceDescription>
          <deviceFamily>com.hp.ov.nms.devices.cisconexus7000seriesswitches</deviceFamily>
          <deviceModel>ciscoNexusC7018</deviceModel>
          <deviceVendor>com.hp.ov.nms.devices.cisco</deviceVendor>
          <discoveryState>DISCOVERY_COMPLETED</discoveryState>
          <systemContact>xxxxxxxxxxxxxxxxx</systemContact>
          <systemDescription>xxxxxxxxxxxxxxxxxx</systemDescription>
          <systemLocation>xxxxxxxxxxxxxx</systemLocation>
          <systemName>xxxxxxxxxxxxxxxxxx</systemName>
          <systemObjectId>.1.3.6.1.4.1.9.12.3.1.3.777</systemObjectId>
          <uuid>6f5ef089-6a51-459f-bde1-9cf18e4f8ca7</uuid>
        </item>
        <item>
          <created>2013-04-22T12:51:56.872Z</created>
          <deviceCategory>com.hp.ov.nms.devices.switchrouter</deviceCategory>
          <deviceDescription>Cisco Nexus C7018 DataCenter Switch</deviceDescription>
          <deviceFamily>com.hp.ov.nms.devices.cisconexus7000seriesswitches</deviceFamily>
          <deviceModel>ciscoNexusC7018</deviceModel>
          <deviceVendor>com.hp.ov.nms.devices.cisco</deviceVendor>
          <discoveryState>DISCOVERY_COMPLETED</discoveryState>
          <systemContact>xxxxxxxxxxxxxxxxxx</systemContact>
          <systemDescription>xxxxxxxxxxxxxxxxxxxxxxxxx</systemDescription>
          <systemLocation>xxxxxxxxxxxxxxxxxxx</systemLocation>
          <systemName>xxxxxxxxxxxxxxxxxxx</systemName>
          <systemObjectId>.1.3.6.1.4.1.9.12.3.1.3.777</systemObjectId>
          <uuid>bae02b8c-25d4-4b53-bef0-2d5b14536e0b</uuid>
        </item>
        </item>
      </return>
    </ns1:getNodesResponse>
  </env:Body>
</env:Envelope>

我怎样才能遍历每个<item>,然后为每个项目打印出不同的项目值?我在考虑只是为了<item>,然后在每个<item></item>之间挑选数据,但我不确定是否有更好的方法来执行此操作。我将使用bash / linux shell命令

伪代码:

for i in item
     print i.uuid,i.systemName

1 个答案:

答案 0 :(得分:1)

使用xml解析器或xml查询语言而不是正则表达式和bash命令会更好。如果您使用某种语言进行编程,请参阅基于xml解析器的DOMSAXStAX等。您还可以使用XQuery对xml使用类似SQL的语法;另一种获取数据的语言可以是xpath

http://www.w3schools.com/xsl/xpath_intro.asp
http://www.w3schools.com/xsl/xquery_intro.asp

但是如果你仍然坚持使用bash工具..这里有一个sed 1-liner:

$ sed -n -e '/<item>/,/<\/item>/p' xml | sed -r -e 's/^\s*<uuid>(.*)<\/uuid>/\1/g' -e 's/^\s*<systemName>(.*)<\/systemName>/\1/g' -e '/^\s*</d' | sed -n 'N;s/\n/,/g;p'
xxxxxxxxxxxxxxxxxxx,c8652440-caf2-490b-8892-cb914a39d19e
xxxxxxxxxxxxxxxxxx,6f5ef089-6a51-459f-bde1-9cf18e4f8ca7
xxxxxxxxxxxxxxxxxxx,bae02b8c-25d4-4b53-bef0-2d5b14536e0b
$ 

故障:

  1. sed -n -e '/<item>/,/<\/item>/p' xml
  2. sed -r -e 's/^\s*<uuid>(.*)<\/uuid>/\1/g' -e
    's/^\s*<systemName>(.*)<\/systemName>/\1/g' -e '/^\s*</d'
  3. sed -n 'N;s/\n/,/g;p'
  4. 表达式1:抑制默认打印,以及带有范围的p(打印)行。起始行应匹配正则表达式<item>,结束行必须匹配正则表达式</item>. This gives you all items ...`。

    表达式2:现在我们剥离标记<uuid></uuid><SystemName></SystemName>,并使用正则表达式和s保留内部部分(替代)命令。

    表达式3:禁止默认打印(-n); N从输入读取下一行并将其连接到前一行(已经被sed读入模式空间);因此连接由换行符\n分隔的连续行。然后我们用comman替换\n char并打印模式空间(p)。