Question

我在提取当前节点内容时遇到了问题，包括所有子节点。

就像下面的代码一样，我想获得字符串 abcdefg<b>b1b2b3</b> 在预标签。

但我无法使用＆＃34; child :: *＆＃34;为拿到它，为实现它。如果我使用＆＃34; / text（）＆＃34;，我丢失了b标签格式信息。请帮帮我。

# -*- coding: utf-8 -*-
from lxml import html
import lxml.etree as le

input = "<pre>abcdefg<b>b1b2b3</b></pre>"
input_xpath = "//pre/child::*"
tree = html.fromstring(input)
result = tree.xpath(input_xpath)
result1 = [le.tostring(item) for item in result]
result2 = ''.join(result1)
print result2

output: <b>b1b2b3</b>

Answer 1

要获取XML节点的内容标记（有时称为"innerXML"），您可以先选择节点（而不是选择子节点或文本内容）：

from lxml import html
import lxml.etree as le

input = "<pre>abcdefg<b>b1b2b3</b></pre>"
tree = html.fromstring(input)
node = tree.xpath("//pre")[0]

然后将文本内容与所有子节点标记结合起来：

result = node.text + ''.join(le.tostring(e) for e in node)
print result

输出：

abcdefg<b>b1b2b3</b>

Answer 2

尝试使用以下

替换您的xpath

In [0]: input = "<pre>abcdefg<b>b1b2b3</b></pre>"

In [1]: input_xpath = "//pre//text()"

In [2]: tree = html.fromstring(input)

In [3]: result = tree.xpath(input_xpath)

In [4]: result
Out[5]: ['abcdefg', 'b1b2b3']

Xpath提取当前节点内容，包括所有子节点

2 个答案: