使用命令行解析XML

时间:2015-07-17 22:32:22

标签: xml

如何解析具有以下内容的XML?

<?xml version="1.0"?>
<saw:ibot xmlns:saw="com.siebel.analytics.web/report/v1" version="1" priority="normal" jobID="36                                                                        ">
  <saw:schedule timeZoneId="(GMT-05:00) Eastern Time (US &amp; Canada)" disabled="false">
    <saw:start repeatMinuteInterval="60" endTime="23:59:00" startImmediately="true"/>
    <saw:recurrence runOnce="false">
      <saw:weekly weekInterval="1" mon="true" tue="true" wed="true" thu="true" fri="true"/>
    </saw:recurrence>
  </saw:schedule>
  <saw:dataVisibility type="recipient" runAs="cgm"/>
  <saw:choose>
    <saw:when condition="true">
      <saw:deliveryContent>
        <saw:headline>
          <saw:caption>
            <saw:text>Availability Parity Alert for Next 14 Days (@{NQ_SESSION.LBL_Next_14_Arriv                                                                        al_Days})</saw:text>
          </saw:caption>
        </saw:headline>
        <saw:conditionalReport/>
      </saw:deliveryContent>
      <saw:postActions/>
    </saw:when>
...skipping...
al_Days})</saw:text>
          </saw:caption>
        </saw:headline>
        <saw:conditionalReport/>
      </saw:deliveryContent>
      <saw:postActions/>
    </saw:when>
    <saw:otherwise/>
  </saw:choose>
  <saw:deliveryDestinations>
    <saw:destination category="dashboard"/>
    <saw:destination category="activeDeliveryProfile"/>
  </saw:deliveryDestinations>
  <saw:recipients subscribers="true" customize="false" specificRecipients="false">
    <saw:subscribers>
      <saw:user name="mbussey@xyz.com"/>
      <saw:user name="kimmy.chan@pqr.com"/>
      <saw:user name="chudgins@gmail.com"/>
    </saw:subscribers>
  </saw:recipients>
  <saw:conditionQuery>
    <saw:reportRefNode path="/shared/Quote/Product/Alerts/Daily Availability Parity Alert - Next                                                                         14 Days - Content"/>
  </saw:conditionQuery>
</saw:ibot>

并检索以下输出?

mbussey@xyz.com
kimmy.chan@pqr.com
chudgins@gmail.com

此外,我还有5个.xml文件,其中包含不同的解析名称值。无论如何,我们可以在命令行中解析并合并它们并输出到一个文件中吗?

我尝试了sedawk选项,但没有帮助我获得所需的输出。

2 个答案:

答案 0 :(得分:3)

此命令将解析XML文档并使用XPath为位置name

处的元素提取/saw:ibot/saw:recipients/saw:subscribers/saw:user属性值
xmlstarlet sel -t -v '/saw:ibot/saw:recipients/saw:subscribers/saw:user/@name' </tmp/xml

输出

mbussey@xyz.com
kimmy.chan@pqr.com
chudgins@gmail.com

答案 1 :(得分:1)

使用XML Parser。就个人而言 - 就像XML::Twigperl一样。

#!/usr/bin/env perl

use strict;
use warnings;
use XML::Twig;

my $twig = XML::Twig->new( );
$twig->parsefile ( 'your_file.xml' );

foreach my $saw_user ( $twig->get_xpath('//saw:user') ) {
    print $saw_user ->att('name'), "\n";
}

打印:

mbussey@xyz.com
kimmy.chan@pqr.com
chudgins@gmail.com

如果你想要一个衬垫&#39;然后改为:

perl -MXML::Twig -0777 -e 'print map { $_ -> att('name')."\n"} ( XML::Twig->parse( <> )->get_xpath('//saw:user') )' your_xml_file

请为了将来的维护程序员和系统管理员而使用 - 请勿使用正则表达式来解析XML。为什么你会问?好吧,因为以XML为例 - 它看起来像任何一个并且在语义上仍然相同:

(你的例子+

<?xml version="1.0" encoding="utf-8"?>
<saw:ibot
    jobID="36"
    priority="normal"
    version="1"
    xmlns:saw="com.siebel.analytics.web/report/v1">
  <saw:schedule
      disabled="false"
      timeZoneId="(GMT-05:00) Eastern Time (US &amp; Canada)">
    <saw:start
        endTime="23:59:00"
        repeatMinuteInterval="60"
        startImmediately="true"
    />
    <saw:recurrence runOnce="false">
      <saw:weekly
          fri="true"
          mon="true"
          thu="true"
          tue="true"
          wed="true"
          weekInterval="1"
      />
    </saw:recurrence>
  </saw:schedule>
  <saw:dataVisibility
      runAs="cgm"
      type="recipient"
  />
  <saw:choose>
    <saw:when condition="true">
      <saw:deliveryContent>
        <saw:headline>
          <saw:caption>
            <saw:text>Availability Parity Alert for Next 14 Days (@{NQ_SESSION.LBL_Next_14_Arrival_Days})</saw:text>
          </saw:caption>
        </saw:headline>
        <saw:conditionalReport/>
      </saw:deliveryContent>
      <saw:postActions/>
    </saw:when>
    <saw:otherwise/>
  </saw:choose>
  <saw:deliveryDestinations>
    <saw:destination category="dashboard" />
    <saw:destination category="activeDeliveryProfile" />
  </saw:deliveryDestinations>
  <saw:recipients
      customize="false"
      specificRecipients="false"
      subscribers="true">
    <saw:subscribers>
      <saw:user name="mbussey@xyz.com" />
      <saw:user name="kimmy.chan@pqr.com" />
      <saw:user name="chudgins@gmail.com" />
    </saw:subscribers>
  </saw:recipients>
  <saw:conditionQuery>
    <saw:reportRefNode path="/shared/Quote/Product/Alerts/Daily Availability Parity Alert - Next 14 Days - Content" />
  </saw:conditionQuery>
</saw:ibot>

或者像这样(注意元素的标记包装)

<?xml version="1.0" encoding="utf-8"?>
<saw:ibot jobID="36" priority="normal" version="1" xmlns:saw="com.siebel.analytics.web/report/v1">
  <saw:schedule disabled="false" timeZoneId="(GMT-05:00) Eastern Time (US &amp; Canada)">
    <saw:start endTime="23:59:00" repeatMinuteInterval="60" startImmediately="true"/>
    <saw:recurrence runOnce="false">
      <saw:weekly fri="true" mon="true" thu="true" tue="true" wed="true" weekInterval="1"/>
    </saw:recurrence>
  </saw:schedule>
  <saw:dataVisibility runAs="cgm" type="recipient"/>
  <saw:choose>
    <saw:when condition="true">
      <saw:deliveryContent>
        <saw:headline>
          <saw:caption>
            <saw:text>Availability Parity Alert for Next 14 Days (@{NQ_SESSION.LBL_Next_14_Arrival_Days})</saw:text>
          </saw:caption>
        </saw:headline>
        <saw:conditionalReport/>
      </saw:deliveryContent>
      <saw:postActions/>
    </saw:when>
    <saw:otherwise/>
  </saw:choose>
  <saw:deliveryDestinations>
    <saw:destination category="dashboard"/>
    <saw:destination category="activeDeliveryProfile"/>
  </saw:deliveryDestinations>
  <saw:recipients customize="false" specificRecipients="false" subscribers="true">
    <saw:subscribers>
      <saw:user name="mbussey@xyz.com"/>
      <saw:user name="kimmy.chan@pqr.com"/>
      <saw:user name="chudgins@gmail.com"/>
    </saw:subscribers>
  </saw:recipients>
  <saw:conditionQuery>
    <saw:reportRefNode path="/shared/Quote/Product/Alerts/Daily Availability Parity Alert - Next 14 Days - Content"/>
  </saw:conditionQuery>
</saw:ibot>

或者像这样:

<?xml version="1.0" encoding="utf-8"?>
<saw:ibot
jobID="36"
priority="normal"
version="1"
xmlns:saw="com.siebel.analytics.web/report/v1"
><saw:schedule
disabled="false"
timeZoneId="(GMT-05:00) Eastern Time (US &amp; Canada)"
><saw:start
endTime="23:59:00"
repeatMinuteInterval="60"
startImmediately="true"
/><saw:recurrence
runOnce="false"
><saw:weekly
fri="true"
mon="true"
thu="true"
tue="true"
wed="true"
weekInterval="1"
/></saw:recurrence></saw:schedule><saw:dataVisibility
runAs="cgm"
type="recipient"
/><saw:choose
><saw:when
condition="true"
><saw:deliveryContent
><saw:headline
><saw:caption
><saw:text
>Availability Parity Alert for Next 14 Days (@{NQ_SESSION.LBL_Next_14_Arrival_Days})</saw:text></saw:caption></saw:headline><saw:conditionalReport
/></saw:deliveryContent><saw:postActions
/></saw:when><saw:otherwise
/></saw:choose><saw:deliveryDestinations
><saw:destination
category="dashboard"
/><saw:destination
category="activeDeliveryProfile"
/></saw:deliveryDestinations><saw:recipients
customize="false"
specificRecipients="false"
subscribers="true"
><saw:subscribers
><saw:user
name="mbussey@xyz.com"
/><saw:user
name="kimmy.chan@pqr.com"
/><saw:user
name="chudgins@gmail.com"
/></saw:subscribers></saw:recipients><saw:conditionQuery
><saw:reportRefNode
path="/shared/Quote/Product/Alerts/Daily Availability Parity Alert - Next 14 Days - Content"
/></saw:conditionQuery></saw:ibot>

希望通过查看这些示例,您会看到通过以完美有效的方式重新格式化XML,您的正则表达式可能有一天会神秘地破解。