带有标签/标记的HTMl Xpath返回

时间:2017-11-10 05:49:28

标签: python html xml xpath lxml

输入Html

<section id="article">
  <p>Hey This is XXX</p>
</section>

我正在使用lxml xpath来提取数据

xpath_paragraph = '//section[@id="article"]/p//text()'
items = mydoc.xpath(xpath_paragraph)

我得到的结果是:

Hey This is XXX

预期结果:

<p>Hey This is XXX</p>

结果是可以理解的,我提取文本,我也尝试过node()。它也不起作用。我需要使用Tags来提取数据。

2 个答案:

答案 0 :(得分:0)

这应该对你有用

import xml.etree.ElementTree as ET

data='''
<section id="article">
 <p>Hey This is XXX</p>
 </section>'''

root = ET.fromstring(data)
for value in root.iter('section'):
    rank=value.find('p').text


#this is to initialize child  
for child in root:
    pass
    #print child.tag,child.attrib

print '<'+child.tag+'>'+rank+'</'+child.tag+'>'

输出

<p>Hey This is XXX</p>

答案 1 :(得分:0)

您明确选择了文本节点('// section [@ id =“article”] / p // text()'),请尝试以下表达式

xpath_paragraph = '//section[@id="article"]/p'

应该选择p元素