Question

I have this data:

<data>
  <foo>foo text</foo>
  data text
    <bar>
      bar text
      <baz>text</baz>
      <baz>text</baz>
      bar text
    </bar>
   data text
</data>

and I need get all text values in order, modify text inside "baz" tag and print. My code is:

text = []
for element in etree.xpath("./*"):
    text.extend(element.xpath("./text()"))
    if element.tag == 'bar':
        text.extend(["baz " + s for s in element.xpath("./baz/text()")])
print '\n'.join([s.strip() for s in text if s.strip()])

output is:

foo text
bar text
bar text
baz text
baz text

but I need:

foo text
data text
bar text
baz text
baz text
bar text
data text

How can I get text() of node in order and without lost data text text?

Edit I know about etree.xpath(".//text()") which can give me all text in order, but I need to modify text inside baz tag. This is a point. How can I get tag value of every element of .//text() XPath?

Answer 1

假设您正在使用lxml，您可以调用getparent()函数来获取文本节点的所有者元素，例如：

import lxml.etree
etree = lxml.etree.fromstring('''
<data>
  <foo>foo text</foo>
  data text
    <bar>
      bar text
      <baz>text</baz>
      <baz>text</baz>
      bar text
    </bar>
   data text
</data>
''')

for text in etree.xpath("//text()[normalize-space()]"):
    parenttag = text.getparent().tag
    print(parenttag, text)

XPath表达式//text()[normalize-space()]仅表示返回XML文档中的所有非空文本节点。

输出

('foo', 'foo text')
('foo', '\n  data text\n    ')
('bar', '\n      bar text\n      ')
('baz', 'text')
('baz', 'text')
('baz', '\n      bar text\n    ')
('bar', '\n   data text\n')

How to get owner element of a text node?

1 个答案: