我正在尝试在Linux系统上使用xmllint从庞大(> 150万行)的xml文档中提取一些特定数据,并且对xmllint语法不太满意。我一直使用grep和awk非常低效地执行此操作,但是我发现该系统具有xmllint实用程序(我从未使用过),并且我发现由于xml结构良好,因此应该有一种直接访问数据的方法。我已经包含了xml文档的一个片段,但是在进行缩减时,虽然看起来对我来说是正确的,但是却导致xmllint出现了解析器错误。我认为,如果您精通xmllint足以回答我的问题,则可以轻松找出解析器错误。
基于网络搜索,我尝试了以下语法:
cat //*/@index' | xmllint --shell stub.xml (which does return ALL of the "indexes")
and
test=$(xmllint --debug --xpath "//PTC/BPSETS/BPSET/BPS" stub.xml) (which does dump the entire BPS entry)
and
xmllint --xpath "string(//PTC/BPSETS/BPSET/@b95)" stub.xml (returns no values)
Here is the xml snippet as best as I can trim it down:
<?xml version="1.0" encoding="utf-8"?>
<PTC version="2.0" cls="2">
<BPSETS>
<BPSET define="b95">
<BPS define="88lmax">
<CRIT>
<MNBS lmt="88" />
<MXBS lmt="88" />
<MXBT red="Y" />
</CRIT>
<PNS>
<PN index="0" atv="1" bf="32203506">
<AWD cpbt="390">
<BUNS ptgp="bn38" bdx="38" fawd="1" />
<BUNS ptgp="bn39" bdx="39" fawd="1" awby="38" />
</AWD>
</PN>
<PN index="1" atv="1" bf="24237243">
<AWD cpbt="390">
<BUNS ptgp="bn38" bdx="38" fawd="1" />
<BUNS ptgp="bn39" bdx="39" fawd="1" awby="38" />
</AWD>
</PN>
<PN index="2" atv="1" bf="8136575">
<AWD cpbt="390">
<BUNS ptgp="bn38" bdx="38" fawd="1" />
<BUNS ptgp="bn39" bdx="39" fawd="1" awby="38" />
</AWD>
</PN>
<PN index="688" atv="1" bf="1183872">
<AWD cpbt="50" />
</PN>
</PNS>
</BPS>
<BPS define="88l6">
<CRIT>
<MNBS lmt="88" />
<MXBS lmt="88" />
<MXBT lmt="6" />
<MNBT lmt="6" />
</CRIT>
<PNS>
<PN index="0" atv="1" bf="28073582">
<AWD cpbt="150">
<BUNS ptgp="bn38" bdx="38" fawd="1" />
<BUNS ptgp="bn39" bdx="39" fawd="1" awby="38" />
</AWD>
</PN>
<PN index="1" atv="1" bf="16686973">
<AWD cpbt="150">
<BUNS ptgp="bn38" bdx="38" fawd="1" />
<BUNS ptgp="bn39" bdx="39" fawd="1" awby="38" />
</AWD>
</PN>
</PNS>
</BPS>
<BPS define="88l4">
<CRIT>
<MNBS lmt="88" />
<MXBS lmt="88" />
<MXBT lmt="4" />
<MNBT lmt="4" />
</CRIT>
<PNS>
<PN index="0" atv="1" bf="31342257">
<AWD cpbt="50">
<BUNS ptgp="bn38" bdx="38" fawd="1" />
<BUNS ptgp="bn39" bdx="39" fawd="1" awby="38" />
</AWD>
</PN>
<PN index="1" atv="1" bf="13761775">
<AWD cpbt="50">
<BUNS ptgp="bn38" bdx="38" fawd="1" />
<BUNS ptgp="bn39" bdx="39" fawd="1" awby="38" />
</AWD>
</PN>
</PNS>
</BPS>
<BPS define="88l2">
<CRIT>
<MNBS lmt="88" />
<MXBS lmt="88" />
<MXBT lmt="2" />
<MNBT lmt="2" />
</CRIT>
<PNS>
<PN index="0" atv="1" bf="16291759">
<AWD cpbt="10">
<BUNS ptgp="bn38" bdx="38" fawd="1" />
<BUNS ptgp="bn39" bdx="39" fawd="1" awby="38" />
</AWD>
</PN>
<PN index="1" atv="1" bf="15032283">
<AWD cpbt="10">
<BUNS ptgp="bn38" bdx="38" fawd="1" />
<BUNS ptgp="bn39" bdx="39" fawd="1" awby="38" />
</AWD>
</PN>
</PNS>
</BPS>
<BPS define="88l1">
<CRIT>
<MNBS lmt="88" />
<MXBS lmt="88" />
<MXBT lmt="1" />
<MNBT lmt="1" />
</CRIT>
<PNS>
<PN index="0" atv="1" bf="33278739">
<AWD>
<BUNS ptgp="bn38" bdx="38" fawd="1" />
<BUNS ptgp="bn39" bdx="39" fawd="1" awby="38" />
</AWD>
</PN>
<PN index="1" atv="1" bf="7261567">
<AWD>
<BUNS ptgp="bn38" bdx="38" fawd="1" />
<BUNS ptgp="bn39" bdx="39" fawd="1" awby="38" />
</AWD>
</PN>
<PN index="896" atv="1" bf="101540">
<AWD cpbt="10" />
</PN>
<PN index="897" atv="1" bf="3680792">
<AWD cpbt="10" />
</PN>
<PN index="898" atv="1" bf="25776896">
<AWD cpbt="10" />
</PN>
</PNS>
</BPS>
</BPSET>
<BPSET define="b94" use="b95">
<BPS define="88mx">
<PNS>
<PN index="422" atv="1" bf="11692089">
<AWD cpbt="9000" />
</PN>
<PN index="424" atv="1" bf="12200338">
<AWD cpbt="7200" />
</PN>
<PN index="427" atv="1" bf="24210225">
<AWD cpbt="6000" />
</PN>
</PNS>
<BPS>
</BPSET>
</BPSETS>
</PTC>
What I really need is a query that returns all the attribute's contained in a specific element under a specific index e.g.:
<!-- language: lang-xml -->
<PTC version="2.0" cls="2">
<PN index="0" atv="1" bf="32203506">
<AWD cpbt="390">
<BUNS ptgp="bn38" bdx="38" fawd="1" />
<BUNS ptgp="bn39" bdx="39" fawd="1" awby="38" />
</AWD>
</PN>
A query that given a PN index value (e.g. 0) would return the values of bf and cbpt…
If it were an sql query the xmllint query I'm looking for would be something like:
```sql
select bf,cbpt from PTC.BPSETS.BPSET.BPS.PNS.PN
where BPSET = "b95" AND BPS = 88lmax AND PN.index = 0.
如果您跟随我的漂泊。 这里的任何指导表示赞赏。谢谢。
答案 0 :(得分:0)
进一步的研究和实验表明这是所需的语法:
echo'cat // PTC / BPSETS / BPSET [@ define =“ b95”] / BPS [@ define =“ 88lmax”] / PNS / PN [@ index =“ 0”] / AWD / @ cpbt'| xmllint --shell stub.xml
这将产生所需的数据。