Question

真的需要这个社区的帮助。

我的问题是当我在python中使用代码时

AIO

在scrapy shell中提取供应商名称，输出为空。我真的不知道为什么会这样，在我看来问题可能是网站信息是动态更新的？

此网页报废的网址为：https://cruiseline.com/cruise/7-night-bahamas-florida-new-york-roundtrip-32860，我需要的是每个供应商的供应商名称和价格。除了附加的图片是＆＃34; inspect＆＃34;的屏幕截图。 enter image description here

但是，类似的代码可以在以下页面中提取价格（＆＃39; https://cruiseline.com/destination/caribbean/cruise/best?sort=rank,ship_status&&direction=desc&page=1&per_page=10&sailing_counts=0＆＃39;）

response.xpath("//div[contains(@class,'check-prices-widget-not-sponsored')]/a/div[contains(@class,'check-prices-widget-not-sponsored-link')]").extract()

非常感谢帮助！

Answer 1

我在scrapy shell中尝试了这个url：https://cruiseline.com/cruise/7-night-bahamas-florida-new-york-roundtrip-32860，我也没有用

response.xpath("//div[contains(@class,'check-prices-widget-not-sponsored')]/a/div[contains(@class,'check-prices-widget-not-sponsored-link')]").extract()

然后我使用查看（响应）命令来弄清楚蜘蛛看到了什么，并发现该网站是动态的，这意味着如果你想在该网站上抓取信息，你需要执行显示信息的js代码。

以下是屏幕截图：

如您所见，您需要的信息无法显示。但是，这个https://cruiseline.com/destination/caribbean/cruise/best?sort=rank,ship_status&&direction=desc&page=1&per_page=10&sailing_counts=0是静态的，这就是为什么你可以抓住你需要的东西。

我有两种方法让你去动态网站（当然还有更多）：

1.Splash（Official Doc）：在您的Spider中，使用SplashRequest而不是scrapy.Request生成您的URL。

2.Selenium + PhantomJS（Official Doc）

祝你好运！：）

使用Xpath提取值时从Scrapy清空列表

1 个答案: