找到合适的选择器CSS来抓取scrapy上的网页

时间:2019-09-05 21:56:55

标签: python css web-scraping scrapy web-crawler

我正在尝试抓取此网页“ https://www.woolworths.com.au/shop/browse/drinks/cordials-juices-iced-teas/iced-teas”以提取产品名称,但是我找不到正确的选择器,即使是价格,h1或标题也是如此!我试过了:

response.css(".shelfProductTile-descriptionLink") #for the name product
response.css(".price-cents") # for the price
response.css(".tileList-title") # for the title

我该如何进行?

1 个答案:

答案 0 :(得分:1)

内容是从POST xhr中动态加载的,返回的json可在浏览器的“网络”标签中找到。

请求转到:

https://www.woolworths.com.au/apis/ui/browse/category

有效载荷:

{"categoryId":"1_9573995","pageNumber":1,"pageSize":24,"sortType":"TraderRelevance","url":"/shop/browse/drinks/cordials-juices-iced-teas/iced-teas","location":"/shop/browse/drinks/cordials-juices-iced-teas/iced-teas","formatObject":"{\"name\":\"Iced Teas\"}","isSpecial":False,"isBundle":False,"isMobile":False,"filters":"null"}

在草草使用时有响应:

json.loads(response.body_as_unicode())