我在提取当前节点内容时遇到了问题,包括所有子节点。
就像下面的代码一样,我想获得字符串
abcdefg<b>b1b2b3</b>
在预标签。
但我无法使用&#34; child :: *&#34;为拿到它,为实现它。 如果我使用&#34; / text()&#34;,我丢失了b标签格式信息。请帮帮我。
# -*- coding: utf-8 -*-
from lxml import html
import lxml.etree as le
input = "<pre>abcdefg<b>b1b2b3</b></pre>"
input_xpath = "//pre/child::*"
tree = html.fromstring(input)
result = tree.xpath(input_xpath)
result1 = [le.tostring(item) for item in result]
result2 = ''.join(result1)
print result2
output: <b>b1b2b3</b>
答案 0 :(得分:2)
要获取XML节点的内容标记(有时称为"innerXML"),您可以先选择节点(而不是选择子节点或文本内容):
from lxml import html
import lxml.etree as le
input = "<pre>abcdefg<b>b1b2b3</b></pre>"
tree = html.fromstring(input)
node = tree.xpath("//pre")[0]
然后将文本内容与所有子节点标记结合起来:
result = node.text + ''.join(le.tostring(e) for e in node)
print result
输出:
abcdefg<b>b1b2b3</b>
答案 1 :(得分:0)
尝试使用以下
替换您的xpathIn [0]: input = "<pre>abcdefg<b>b1b2b3</b></pre>"
In [1]: input_xpath = "//pre//text()"
In [2]: tree = html.fromstring(input)
In [3]: result = tree.xpath(input_xpath)
In [4]: result
Out[5]: ['abcdefg', 'b1b2b3']