从div类XPath

时间:2018-11-16 19:08:31

标签: python-3.x xpath web-scraping scrapy

在使用xpath从亚马逊提取某些图像网址时遇到了问题。

我试图提取网址的页面就是这个

https://www.amazon.com/Touchscreen-Laptop-Tablet-Windows-Quad-Core/dp/B07FYX613Z/ref=sr_1_23/147-3050782-9544926?s=pc&ie=UTF8&qid=1542390985&sr=1-23&keywords=gaming+laptop&refinements=p_36%3A-100000

我有这个:

<div id="ivLargeImage" style="height: 573px; display: block; opacity: 1; visibility: visible; cursor: zoom-in;"><img src="https://images-na.ssl-images-amazon.com/images/I/81zqMok22fL._SL1500_.jpg" class="fullscreen" style="margin-top: 10px; margin-left: 252px; height: 553px; width: 573px;"></div>
    <img src="https://images-na.ssl-images-amazon.com/images/I/81zqMok22fL._SL1500_.jpg" class="fullscreen" style="margin-top: 10px; margin-left: 252px; height: 553px; width: 573px;">

我的目标是提取https://images-na.ssl-images-amazon.com/images/I/81zqMok22fL.SL1500.jpg

我当前正在使用xpath

//div[contains(@id, "ivLargeImage")]/img/@src

当我使用XPath Helper检查时,实际上给了我https://images-na.ssl-images-amazon.com/images/I/81zqMok22fL.SL1500.jpg

问题是,当我使用以下信息提取信息时

item['img0Product']= response.xpath('//div[contains(@id, "ivLargeImage")]/img/@src').extract()

该变量中没有数据。

编辑:添加了亚马逊链接

2 个答案:

答案 0 :(得分:2)

I can get required image with below XPath:

//div[@id="imgTagWrapperId"]/img/@data-old-hires

Try and let me know in case it doesn't work as expected

答案 1 :(得分:0)

也许尝试用extract_first()代替extract()

extract()通常返回选择器列表,而不是单个值。