Scrapy-缺少HTML正文

时间:2019-04-10 12:45:41

标签: html scrapy

使用草率的某些网站不提供完整的html代码,即。 https://www.amazon.de/Warner-Bros-5051891109537-GIOCO-MOVIE/dp/B00HR6RHBK

我尝试获取所有商品列表(https://www.amazon.de/gp/offer-listing/B00HR6RHBK/ref=dp_olp_new?ie=UTF8&condition=new)的链接

Scrapy找不到它。

尝试:

  

$ scrapy shell“ https://www.amazon.de/Warner-Bros-5051891109537-GIOCO-MOVIE/dp/B00HR6RHBK

     

print(response.xpath(“ // a [包含(@href,'new')] / @ href”))

结果:

  

[]

1 个答案:

答案 0 :(得分:1)

该链接在页面源中不存在。因此Scrapy无法找到它。尝试查找/ gp / offer-listing / B00HR6RHBK / ref = dp_olp_all_mbc?ie = UTF8&condition = all,看看是否可以找到该链接