html表值的xpath

时间:2012-08-18 08:33:50

标签: python html lxml

我有这样的HTML:

<html>
<body>
<table>
   <tr>
       Text before Text1
       <td>Text1</td>
       Text after Text1
   </tr>
   <tr>
       Text before Text2
       <td>Text2</td>
       Text after Text2
   </tr>
</table>
</body>
</html>

我正在使用lxml和Python。我想使用XPath查找Text after Text1Text after Text2

我尝试了XPath /html/body/table/tr并获取相对路径./td的文字,但我只能获得Text before Text1Text before Text2

那我怎么能实现这个目标呢?

一个例子:

<tr>
  <td width="16"><img alt="" src="http://source.qunar.com/site/images/airlines/small/HU.gif"></td>
  <td valign="top">海航<span class="dc">HU7605</span><br>首都T1-虹桥/td>
</tr>

我可以找到海航但找不到首都T1-虹桥

2 个答案:

答案 0 :(得分:1)

假装您的文件位于data.xml

from lxml import etree

data = etree.parse('data.xml')

for row in data.xpath('/html/body/table/tr'):
    before, after = row.xpath('text()')
    print before, after

答案 1 :(得分:0)

您可以像这样获得Xpath值

             "//tr"  or "//tr/td"