Question

有没有人知道如何使用scrapy来获取日期？

'<a href="/realDonaldTrump/status/988856839893897222" class="tweet-timestamp js-permalink js-nav js-tooltip" title="12:06 PM - 24 Apr 2018" data-conversation-id="988856839893897222"><span class="_timestamp js-short-timestamp " data-aria-label-part="last" data-time="1524596817" data-time-ms="1524596817000" data-long-form="true">Apr 24</span></a>']'

我使用

获取了此文本

 response.xpath('//*[contains(@class,"tweet-timestamp js-permalink js-nav js-tooltip")]').extract()

我在“title =”之后的信息之后我有点新意，如果你能解释为什么它的效果更好，谢谢。

Answer 1

尝试使用以下xpath获取您要解析的日期。日期在title属性范围内。当你想获得存储在任何属性中的值时，你需要使用它的关键字来调用它。说过key此处为title而value为12:06 PM - 24 Apr 2018。

xpath("//a[contains(@class,'tweet-timestamp')]/@title").extract_first()

输出：

12:06 PM - 24 Apr 2018

Answer 2

获取@ data-time属性中包含的日期（以毫秒为单位）并解析它。

d=float(xpath("string(//a[contains(@class,'tweet-timestamp')]/span/@data-time)"))
datetime.datetime.fromtimestamp(d).strftime('%Y-%m-%d %H:%M:%S')

输出

'2018-04-24 16:06:57'

使用xpath拉取信息

2 个答案: