输入Html
<section id="article">
<p>Hey This is XXX</p>
</section>
我正在使用lxml xpath来提取数据
xpath_paragraph = '//section[@id="article"]/p//text()'
items = mydoc.xpath(xpath_paragraph)
我得到的结果是:
Hey This is XXX
预期结果:
<p>Hey This is XXX</p>
结果是可以理解的,我提取文本,我也尝试过node()。它也不起作用。我需要使用Tags来提取数据。
答案 0 :(得分:0)
这应该对你有用
import xml.etree.ElementTree as ET
data='''
<section id="article">
<p>Hey This is XXX</p>
</section>'''
root = ET.fromstring(data)
for value in root.iter('section'):
rank=value.find('p').text
#this is to initialize child
for child in root:
pass
#print child.tag,child.attrib
print '<'+child.tag+'>'+rank+'</'+child.tag+'>'
输出
<p>Hey This is XXX</p>
答案 1 :(得分:0)
您明确选择了文本节点('// section [@ id =“article”] / p // text()'),请尝试以下表达式
xpath_paragraph = '//section[@id="article"]/p'
应该选择p元素