I have this data:
<data>
<foo>foo text</foo>
data text
<bar>
bar text
<baz>text</baz>
<baz>text</baz>
bar text
</bar>
data text
</data>
and I need get all text values in order, modify text inside "baz" tag and print. My code is:
text = []
for element in etree.xpath("./*"):
text.extend(element.xpath("./text()"))
if element.tag == 'bar':
text.extend(["baz " + s for s in element.xpath("./baz/text()")])
print '\n'.join([s.strip() for s in text if s.strip()])
output is:
foo text
bar text
bar text
baz text
baz text
but I need:
foo text
data text
bar text
baz text
baz text
bar text
data text
How can I get text()
of node in order and without lost data text
text?
Edit
I know about etree.xpath(".//text()")
which can give me all text in order, but
I need to modify text inside baz
tag. This is a point. How can I get tag value of every element of .//text()
XPath?
答案 0 :(得分:1)
假设您正在使用lxml
,您可以调用getparent()
函数来获取文本节点的所有者元素,例如:
import lxml.etree
etree = lxml.etree.fromstring('''
<data>
<foo>foo text</foo>
data text
<bar>
bar text
<baz>text</baz>
<baz>text</baz>
bar text
</bar>
data text
</data>
''')
for text in etree.xpath("//text()[normalize-space()]"):
parenttag = text.getparent().tag
print(parenttag, text)
XPath表达式//text()[normalize-space()]
仅表示返回XML文档中的所有非空文本节点。
输出
('foo', 'foo text')
('foo', '\n data text\n ')
('bar', '\n bar text\n ')
('baz', 'text')
('baz', 'text')
('baz', '\n bar text\n ')
('bar', '\n data text\n')