考虑此页面:
https://www.michaelkors.com/anorak-rainbow-swimsuit-belt-bag-the-michael-tote-dylan-sneaker/_/L-MSTR101163
几天前,我在Stackoverflow上问了一个问题,建议为了搜刮建议,我应该研究scrapy-splash。通过splash,我可以刮除大部分JS,但是,我仍然坚持页面底部的刮除建议。到目前为止,这是我尝试过的:
recommendations = response.xpath("//div[@class ='you-may-also-like-section']/a/@href").getall()
这不返回任何内容。
答案 0 :(得分:0)
您尝试过该选择器吗?
response.css('div.you-may-also-like-section div.product-tile-container a::attr(href)').extract()
此外,您可以尝试在Splash浏览器中设置等待时间。
但是,如果您选中浏览器->网络-> XHR,则会发现此请求https://api.rfksrv.com/search-rec/263221008/3 您应该做的所有事情都是发出这样的请求,但是带有您的数据的请求是从源页面获取的。我会推荐这种方式
在卷曲中看起来像这样:
curl 'https://api.rfksrv.com/search-rec/263221008/3' -H 'Accept: application/json, text/plain, */*' -H 'Referer: https://www.michaelkors.com/anorak-rainbow-swimsuit-belt-bag-the-michael-tote-dylan-sneaker/_/L-MSTR101163' -H 'Origin: https://www.michaelkors.com' -H 'User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/75.0.3770.142 Safari/537.36' -H 'Authorization: Bearer eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJyZWdpb24iOiJ1cy1lYXN0LTEiLCJzdWIiOiJhcGlLZXkvN255c1NhcnEiLCJzY29wZSI6eyIyNjMyMjEwMDgiOlsidzZndDQ0OHh1ZyJdfSwic3RhZ2UiOiJwcm9kIiwianRpIjoiNGI1M2MyOTItZDA4Ny00OGExLTkzYTctN2M5MTUzYjM2YWVmIiwiaWF0IjoxNTYzOTM2Nzk5LCJleHAiOjE1NjQwMjM3OTl9.UDqzF9cZHJ7KkCnrChvAV6vupP-gs6Bplv462rGII98' -H 'Content-Type: application/x-www-form-urlencoded' --data '{"data":{"batch":[{"widget":{"rfkid":"pdp1"}},{"widget":{"rfkid":"pdp2"}},{"widget":{"rfkid":"pdp_edt"}}],"context":{"page":{"uri":"/anorak-rainbow-swimsuit-belt-bag-the-michael-tote-dylan-sneaker/_/L-MSTR101163","sku":["126295789","314419197","287779605","287780826","321049671","512500966"],"locale_country":"us","locale_language":"en"},"user":{"uuid":"263221008-ox-ap-4u-1p-vws74v0y7idt0l5q27j4-1563955671571"}},"n_item":12,"content":{},"appearance":{}}}' --compressed