Question

我有一个类似下面的XML，我正在尝试根据关键字提取节点。尝试使用XPath和XMLLint。但显然，我做的不对。所以希望在这方面提供一些帮助。

XML文件

  <section>
    <h>2 Introduction</h1>
    <region>Intro 1</region>
    <region>Background</region>
  </section>
<article>
 <body>
  <section>
    <h1>2 Task objectives</h1>
    <region>2.1 Primary objectives </region>
    <region>2.</region>
  </section>

  <section>
    <h2>Requirements</h1>
    <region>System Requirements </region>
    <region>Technical Requirements</region>
  </section>

  <section>
    <h3>Design</h1>
    <region>Design methodology </region>
    <region>Design patterns</region>
  </section>
  </body>
</article>

鉴于此XML和关键字Task objectives或objectives（案例不敏感），我需要提取整个节点并写入另一个XML文件

<section>
    <h1>2 Task objectives</h1>
    <region>2.1 Primary objectives </region>
    <region>2.</region>
</section>

我尝试使用Xpath和XMllint进行提取。

 $ xmllint --xpath //body//section//h1[.="Task objectives"] Prior.mod.xml
 XPath error : Invalid predicate
//body//section//h1[.=Task objectives]
                  ^
xmlXPathEval: evaluation failed
XPath evaluation failure

有谁能告诉我上面的问题以及我如何解决它？另外，我想在文件目录的shell中执行此操作。是XMLlint吗？最佳选择？

Answer 1

shell在命令行解析期间删除了引号（"）字符 - 您需要引用整个表达式，如

xmllint --xpath '//body//section//h1[.="Task objectives"]' Prior.mod.xml

示例：

$ xmllint --xpath //body//section//h1[.="Task objectives"] -
<body>
<section>
<h1>Task objectives</h1>
<h1>abcd</h1>
</section>
</body>
^D

导致：

XPath error : Invalid predicate
//body//section//h1[.=Task objectives]
                           ^
xmlXPathEval: evaluation failed
XPath evaluation failure

请注意缺少的引号。然后我尝试了

$ xmllint --xpath '//body//section//h1[.="Task objectives"]' -
<body>
<section>
<h1>Task objectives</h1>
<h1>abcd</h1>
</section>
</body>
^D

产生了输出

<h1>Task objectives</h1>

Answer 2

这适用于XPath 1.0：

//section[contains(
  translate(h1, 'ABCDEFGHIJKLMNOPQRSTUVWXYZ', 'abcdefghijklmnopqrstuvwxyz'),
  'task objectives')
]

根据关键字从xml中提取节点

2 个答案: