从xml文件中提取特定块

时间:2017-05-13 06:59:02

标签: bash xml-parsing

<ruleId>1412</ruleId>

<myCount>2</myCount>
<hisCount>0</hisCount>
<totalCount>2</totalCount>
<ruleId>109942</ruleId>

<myCount>2</myCount>
<hisCount>2785</hisCount>
<totalCount>0</totalCount>
<ruleId>109367</ruleId>

<myCount>1</myCount>
<hisCount>567</hisCount>
<totalCount>0</totalCount>
<ruleId>1412</ruleId>

<myCount>2</myCount>
<hisCount>4</hisCount>
<totalCount>6</totalCount>

我想提取myCounthisCounttotalCount的值ruleId = 1412

这里规则#1412出现两次我的预期输出将是这样的:

mycount-SUM = 2+2 = 4

hisCount-SUM = 0+4 = 4

totalCount-SUM = 2+6 = 8

RuleID  mycount-SUM  hisCount-SUM  totalCount-SUM

1412    4            4             8

1 个答案:

答案 0 :(得分:3)

复杂 bash + xmlstarlet解决方案:

有效的xml结构如下所示(例如MAX):

rules.xml

rule_counts.sh 脚本:

<rules>
<ruleId>1412</ruleId>

<myCount>2</myCount>
<hisCount>0</hisCount>
<totalCount>2</totalCount>
<ruleId>109942</ruleId>

<myCount>2</myCount>
<hisCount>2785</hisCount>
<totalCount>0</totalCount>
<ruleId>109367</ruleId>

<myCount>1</myCount>
<hisCount>567</hisCount>
<totalCount>0</totalCount>
<ruleId>1412</ruleId>

<myCount>2</myCount>
<hisCount>4</hisCount>
<totalCount>6</totalCount>
</rules>

<强> 用法

#!/bin/bash
ruleId=$2

getSum () {
    echo $(xmlstarlet sel -t -v "sum(//ruleId[text()=$2]/following-sibling::$3[1])" "$1")
}

mySum=$(getSum $1 $ruleId "myCount")
hisSum=$(getSum $1 $ruleId "hisCount")
totalSum=$(getSum $1 $ruleId "totalCount")

printf "%-6s\t%-11s\t%-12s\t%-14s\n" "RuleID" "mycount-SUM" "hisCount-SUM" "totalCount-SUM" 
printf "%-6s\t%-11s\t%-12s\t%-14s\n" "$ruleId" "$mySum" "$hisSum" "$totalSum"

签名:bash rule_counts.sh rules.xml 1412 (全部为强制性)

输出:

<shell script> <xml file> <ruleId node value>

<强> 解释

RuleID mycount-SUM hisCount-SUM totalCount-SUM 1412 4 4 8 - 指向传递给shell脚本的第二个命令行参数

ruleId=$2 - xpath表达式,为参数node-set中的每个传递节点返回sum

对于"sum(//ruleId[text()=$2]/following-sibling::$3[1])"节点,它将是myCount
https://www.w3.org/TR/xpath/#function-sum

"sum(//ruleId[text()=1412]/following-sibling::myCount[1])"轴包含上下文节点

的所有以下兄弟节点