来自两个标签的lxml数据

时间:2014-08-18 16:20:54

标签: python lxml

我的代码就像这样

response = urllib2.urlopen("file:///C:/data20140801.html")
page = response.read()
tree = etree.HTML(page)

data = tree.xpath("//p/span/text()")

html页面可以有这种结构

<span style="font-size:10.0pt">Something</span>

html页面也可以有这种结构

<p class="Normal">
    <span style="font-size:10.0pt">Some</span>
    <span style="font-size:10.0pt">thing<span>
</p>

我想要获得两个相同的代码&#34; Something&#34;

1 个答案:

答案 0 :(得分:2)

XPath表达式返回值列表

>>> from lxml.html import etree
>>> tree = etree.HTML('''\
... <p class="Normal">
...     <span style="font-size:10.0pt">Some</span>
...     <span style="font-size:10.0pt">thing<span>
... </p>
... ''')
>>> tree.xpath("//p/span/text()")
['Some', 'thing']

使用''.join()将两个字符串合并为一个:

>>> ''.join(tree.xpath("//p/span/text()"))
'Something'