如何使用AWK打印XML元素的内容 - 从起始标记到结束标记?
例如,请考虑以下XML:
<flight>
<airline>Delta</airline>
<flightno>22</flightno>
<origin>Atlanta</origin>
<destination>Paris</destination>
<departure>5:40pm</departure>
<arrival>8:10am</arrival>
</flight>
<city id="AT">
<cityname>Athens</cityname>
<state>GA</state>
<description> Home of the University of Georgia</description>
<population>100,000</population>
<location>Located about 60 miles Northeast of Atlanta</location>
<latitude>33 57' 39" N</latitude>
<longitude>83 22' 42" W</longitude>
</city>
所需的输出可能是city
元素的内容,从<city...>
到</city>
。
答案 0 :(得分:5)
使用awk和sed等工具解析XML的解决方案并不完美。您不能依赖XML始终具有人类可读的布局。例如,某些Web服务将省略新行,导致整个XML文档出现在一行上。
我建议使用xmllint,它能够使用XPATH(一种为XML设计的查询语言)选择节点。
以下命令将选择城市标签:
xmllint --xpath "//city" data.xml
XPath非常有用。它使XML文档的每个部分都可寻址:
xmllint --xpath "string(//city[1]/@id)" data.xml
返回字符串“AT”。
这次返回第一次出现的“city”标签。 xmllint也可用于打印结果:
$ xmllint --xpath "//city[1]" data.xml | xmllint -format -
<?xml version="1.0"?>
<city id="AT">
<cityname>Athens</cityname>
<state>GA</state>
<description> Home of the University of Georgia</description>
<population>100,000</population>
<location>Located about 60 miles Northeast of Atlanta</location>
<latitude>33 57' 39" N</latitude>
<longitude>83 22' 42" W</longitude>
</city>
在同一数据中,第一个“城市”标签全部显示在一行上。这是有效的XML。
<data>
<flight>
<airline>Delta</airline>
<flightno>22</flightno>
<origin>Atlanta</origin>
<destination>Paris</destination>
<departure>5:40pm</departure>
<arrival>8:10am</arrival>
</flight>
<city id="AT"> <cityname>Athens</cityname> <state>GA</state> <description> Home of the University of Georgia</description> <population>100,000</population> <location>Located about 60 miles Northeast of Atlanta</location> <latitude>33 57' 39" N</latitude> <longitude>83 22' 42" W</longitude> </city>
<city id="DUB">
<cityname>Dublin</cityname>
<state>Dub</state>
<description> Dublin</description>
<population>1,500,000</population>
<location>Ireland</location>
<latitude>NA</latitude>
<longitude>NA</longitude>
</city>
</data>
答案 1 :(得分:1)
$ awk -v tag='city' '$0~"^<"tag"\\>"{inTag=1} inTag; $0~"^</"tag">"{inTag=0}' file
<city id="AT">
<cityname>Athens</cityname>
<state>GA</state>
<description> Home of the University of Georgia</description>
<population>100,000</population>
<location>Located about 60 miles Northeast of Atlanta</location>
<latitude>33 57' 39" N</latitude>
<longitude>83 22' 42" W</longitude>
</city>
使用上面的GNU awk进行\>
字边界功能。使用其他awks时使用[^[:alnum:]_]
或类似的。
仅打印第一次出现:
$ awk -v tag='city' '$0~"^<"tag"\\>"{inTag=1} inTag{print; if ($0~"^</"tag">") exit}' file
<city id="AT">
<cityname>Athens</cityname>
<state>GA</state>
<description> Home of the University of Georgia</description>
<population>100,000</population>
<location>Located about 60 miles Northeast of Atlanta</location>
<latitude>33 57' 39" N</latitude>
<longitude>83 22' 42" W</longitude>
</city>