捕获xml标记并跟踪相关标记(如果存在)

时间:2017-10-23 21:40:48

标签: xml linux bash sed xmllint

我希望从下面的(剪切的)XML文件中提取程序标题和子标题。我使用xmllint和sed单独提取它们并将它们组合成一个文件,但我发现偶尔的条目只有标题而没有子标题。在这种情况下,我想将副标题留空。有人可以建议一种方法来解释这种差异吗?

XML文件

<programme start="20171013170000 +0100" stop="20171013180000 +0100" channel="b492458d826d592ec7c528545a16c757">
  <title lang="eng">Accessories Gift Hall</title>
  <sub-title lang="eng">Find the perfect gift with fashion accessories by some of our most sought-after brands. From chic purses and wallets to cosy PJs and slippers, there&apos;s something for everyone.</sub-title>
</programme>
<programme start="20171013180000 +0100" stop="20171014130000 +0100" channel="b492458d826d592ec7c528545a16c757">
  <title lang="eng">..programmes start again at 1pm</title>
</programme>
<programme start="20171014130000 +0100" stop="20171014140000 +0100" channel="b492458d826d592ec7c528545a16c757">
  <title lang="eng">Ruth Langsford&apos;s Fashion Edit</title>
  <sub-title lang="eng">TV personality and QVC fashion ambassador, Ruth Langsford, shares her favourite looks and must-have pieces that will transform your wardrobe and have you looking fabulously stylish.</sub-title>
</programme>

Bash命令v1

xmllint --xpath "//programme/title" xmltv | sed -r 's/\n//g' | sed 's/<\/title>/\n/g' | sed 's/<title lang="eng">//g' > 1.txt
xmllint --xpath "//programme/sub-title" xmltv | sed -r 's/\n//g' | sed 's/<\/sub-title>/\n/g' | sed 's/<sub-title lang="eng">//g' > 2.txt
paste <(cat 1.txt) <(cat 2.txt) > 3.txt

谢谢!

3 个答案:

答案 0 :(得分:2)

以下是从命令行使用sel xmlstarlet命令的示例...

$ xmlstarlet sel -T -t -m '//programme' -v 'concat(normalize-space(title)," ",normalize-space(sub-title))' -n input.xml
Accessories Gift Hall Find the perfect gift with fashion accessories by some of our most sought-after brands. From chic purses and wallets to cosy PJs and slippers, there's something for everyone.
..programmes start again at 1pm
Ruth Langsford's Fashion Edit TV personality and QVC fashion ambassador, Ruth Langsford, shares her favourite looks and must-have pieces that will transform your wardrobe and have you looking fabulously stylish.

我将标题和副标题分隔为一个空格,但可以更改。

答案 1 :(得分:0)

我会做什么:

{{1}}

答案 2 :(得分:0)

一次性使用sed

sed '/<title/!d;N;/<sub-title/!s/\n.*//' XML File