如何在lxml解析中获得确切的日期?

时间:2012-05-17 09:13:32

标签: python xpath lxml

解析html文档时,我遇到了一个问题。 html文档的跨度如下:

<span class="time">Thu May 17, 2012 12:20 pm</span>

当我解析它(它在td内)时:

row.xpath('string(./td/span/text())')

我得到以下内容:

Wed May 16, 2012 11:20 pm

可能是什么问题?

1 个答案:

答案 0 :(得分:1)

可能./td/span匹配多个元素。在XPath中使用string()时,只会处理第一个结果:

>>> html = """<html>
...             <td><span class="time">Wed May 16, 2012 11:20 pm</span></td>
...             <td><span class="time">Thu May 17, 2012 12:20 pm</span></td>
...           </html>"""
>>> t = etree.fromstring(html)
>>> t.xpath('string(./td/span)')
'Wed May 16, 2012 11:20 pm'

你应该编写一个更具体的XPath来获取你想要的行,或者循环遍历行:

>>> for row in t.xpath("./td/span"):
...     print(row.xpath("string(.)"))
...     
Wed May 16, 2012 11:20 pm
Thu May 17, 2012 12:20 pm

(注意:我已删除了text(),因为在这种情况下不需要。text() might not do what you think it does。)