Question

您好，我只是想在该网站Bloomberg上刮取“标题”和“发布日期”，所以我确定我使用的是正确的response.xpath，但始终无法获取。

response.xpath("//h1[@class = 'lede-text-v2__hed']").extract_first()
response.xpath("//meta[@property = 'og:title']/@content").extract_first()

我两个都无法获得标题

也在发布日期

response.xpath("//time[@class = 'article-timestamp']/@datetime").extract_first()

一无所获，请问有什么想法吗？

这是网址

https://www.bloomberg.com/news/articles/2019-05-30/tesla-dealt-another-blow-as-barclays-sees-it-as-niche-carmaker

谢谢！

Answer 1

被检测为机器人。

使用scrapy shell <url>和view(response)查看您收到的回复。

避免被发现的措施包括：

在后两种情况下，请准备好使用多个代理，以防它们由于活动异常活跃而禁止了您的IP地址。