Question

输入Html

<section id="article">
  <p>Hey This is XXX</p>
</section>

我正在使用lxml xpath来提取数据

xpath_paragraph = '//section[@id="article"]/p//text()'
items = mydoc.xpath(xpath_paragraph)

我得到的结果是：

Hey This is XXX

预期结果：

<p>Hey This is XXX</p>

结果是可以理解的，我提取文本，我也尝试过node（）。它也不起作用。我需要使用Tags来提取数据。

Answer 1

这应该对你有用

import xml.etree.ElementTree as ET

data='''
<section id="article">
 <p>Hey This is XXX</p>
 </section>'''

root = ET.fromstring(data)
for value in root.iter('section'):
    rank=value.find('p').text


#this is to initialize child  
for child in root:
    pass
    #print child.tag,child.attrib

print '<'+child.tag+'>'+rank+'</'+child.tag+'>'

输出

<p>Hey This is XXX</p>

Answer 2

您明确选择了文本节点（'// section [@ id =“article”] / p // text（）'），请尝试以下表达式

xpath_paragraph = '//section[@id="article"]/p'

应该选择p元素

带有标签/标记的HTMl Xpath返回

2 个答案: