我想写一个小实用程序,它将执行以下操作:
e.g。对于以下XML文件:
<?xml version="1.0"?>
<data>
<country name="Liechtenstein">
<rank>1</rank>
<year>2008</year>
<gdppc>141100</gdppc>
</country>
<country name="Singapore">
<rank>4</rank>
<year>2011</year>
</country>
<town name="London">
<year>2014</year>
</town>
</data>
./ myscript year 应该提供以下输出:
data.country.1.year
data.country.2.year
data.town.year
写了下面的脚本,但不知道如何找出每个元素的索引。有没有办法做到这一点?感谢。
#!/usr/bin/python
from lxml import etree
import sys
tree=etree.parse('file.xml')
tag = '//' + sys.argv[1]
find_text = etree.XPath(tag)
for j in [tree.getpath(text) for text in find_text(tree)]:
print j.replace('/','.')[1:]
答案 0 :(得分:1)
最简单的方法是正则表达式。
#!/usr/bin/python
from lxml import etree
import sys
import re
tree=etree.parse('file.xml')
tag = '//' + sys.argv[1]
find_text = etree.XPath(tag)
for j in [tree.getpath(text) for text in find_text(tree)]:
print re.sub(r'[\/\[\]]+', '.', j)[1:] # this will change [ / and ] to a dot.
输出: data.country.1.year