我的代码就像这样
response = urllib2.urlopen("file:///C:/data20140801.html")
page = response.read()
tree = etree.HTML(page)
data = tree.xpath("//p/span/text()")
html页面可以有这种结构
<span style="font-size:10.0pt">Something</span>
html页面也可以有这种结构
<p class="Normal">
<span style="font-size:10.0pt">Some</span>
<span style="font-size:10.0pt">thing<span>
</p>
我想要获得两个相同的代码&#34; Something&#34;
答案 0 :(得分:2)
XPath表达式返回值列表:
>>> from lxml.html import etree
>>> tree = etree.HTML('''\
... <p class="Normal">
... <span style="font-size:10.0pt">Some</span>
... <span style="font-size:10.0pt">thing<span>
... </p>
... ''')
>>> tree.xpath("//p/span/text()")
['Some', 'thing']
使用''.join()
将两个字符串合并为一个:
>>> ''.join(tree.xpath("//p/span/text()"))
'Something'