我想在动态网站上抓取图片,但不知道如何

时间:2020-06-29 16:13:55

标签: python selenium web-scraping

我正在寻找有关如何解决此问题的建议。这是交易。我为纪梵希(Givenchy)工作,我想从https://www.givenchy.com/us/en-US/women/bags/all-bags/?start=0&sz=21抓取所有图像,以便将它们编译为照片共享。我想要的图像是最初显示的图像,即,将鼠标放在图像上之前在网站上显示的图像。区别很重要,因为当您将鼠标放在图像上时,它变成了穿着包包的模型的图像。我只想要袋子本身的图像。当我使用Chrome检查工具检查页面时,只能看到带有模型的图像的链接。

有没有一种方法可以做我想做的事情,如果可以的话怎么做?

3 个答案:

答案 0 :(得分:1)

不需要

selenium。图片位于标签<picture> <source ...>内,因此通过正确的CSS选择器和字符串操作,您可以获取图片的网址。

例如:

import requests
from bs4 import BeautifulSoup


url = 'https://www.givenchy.com/us/en-US/women/bags/all-bags/?start=0&sz=21'
soup = BeautifulSoup(requests.get(url).content, 'html.parser')

for p in soup.select('picture.thumb-img source[media="(min-width: 1800px)"][srcset*="/images/"]'):
    p = p['srcset'].split(',')[-1].split()[0]
    print(p)

打印:

https://www.givenchy.com/dw/image/v2/BBRT_PRD/on/demandware.static/-/Sites-Givenchy_master/default/dwe86ac579/images/BB500CB0WY001/BB500CB0WY001-01-01.jpg?sw=800
https://www.givenchy.com/dw/image/v2/BBRT_PRD/on/demandware.static/-/Sites-Givenchy_master/default/dw8c8efbee/images/BB50F2B0WY001/BB50F2B0WY001-01-01.jpg?sw=800
https://www.givenchy.com/dw/image/v2/BBRT_PRD/on/demandware.static/-/Sites-Givenchy_master/default/dw72d49df0/images/BB50F2B0WD001/BB50F2B0WD001-01-01.jpg?sw=800
https://www.givenchy.com/dw/image/v2/BBRT_PRD/on/demandware.static/-/Sites-Givenchy_master/default/dw16bf6873/images/BB50F0B0WD001/BB50F0B0WD001-01-01.jpg?sw=800
https://www.givenchy.com/dw/image/v2/BBRT_PRD/on/demandware.static/-/Sites-Givenchy_master/default/dwa89db782/images/BB50F0B0WD309/BB50F0B0WD309-01-01.jpg?sw=800
https://www.givenchy.com/dw/image/v2/BBRT_PRD/on/demandware.static/-/Sites-Givenchy_master/default/dwb8bb418a/images/BB50F0B0WD051/BB50F0B0WD051-01-01.jpg?sw=800
https://www.givenchy.com/dw/image/v2/BBRT_PRD/on/demandware.static/-/Sites-Givenchy_master/default/dweacfc390/images/BB50F2B0WD292/BB50F2B0WD292-01-01.jpg?sw=800
https://www.givenchy.com/dw/image/v2/BBRT_PRD/on/demandware.static/-/Sites-Givenchy_master/default/dw51675237/images/BB50F2B0WD051/BB50F2B0WD051-01-01.jpg?sw=800
https://www.givenchy.com/dw/image/v2/BBRT_PRD/on/demandware.static/-/Sites-Givenchy_master/default/dw47ef9b42/images/BB50F3B0WD001/BB50F3B0WD001-01-01.jpg?sw=800
https://www.givenchy.com/dw/image/v2/BBRT_PRD/on/demandware.static/-/Sites-Givenchy_master/default/dw32b9df63/images/BB50F3B0WD051/BB50F3B0WD051-01-01.jpg?sw=800
https://www.givenchy.com/dw/image/v2/BBRT_PRD/on/demandware.static/-/Sites-Givenchy_master/default/dw102294c8/images/BB50F3B0WD496/BB50F3B0WD496-01-01.jpg?sw=800
https://www.givenchy.com/dw/image/v2/BBRT_PRD/on/demandware.static/-/Sites-Givenchy_master/default/dw09d01050/images/BB50F3B0WD662/BB50F3B0WD662-01-01.jpg?sw=800
https://www.givenchy.com/dw/image/v2/BBRT_PRD/on/demandware.static/-/Sites-Givenchy_master/default/dw442b46a4/images/BB50F2B0WD542/BB50F2B0WD542-01-01.jpg?sw=800
https://www.givenchy.com/dw/image/v2/BBRT_PRD/on/demandware.static/-/Sites-Givenchy_master/default/dw1e454ef3/images/BB50F2B0WD309/BB50F2B0WD309-01-01.jpg?sw=800
https://www.givenchy.com/dw/image/v2/BBRT_PRD/on/demandware.static/-/Sites-Givenchy_master/default/dw3aa399b9/images/BB05117012542/BB05117012542-01-01.jpg?sw=800
https://www.givenchy.com/dw/image/v2/BBRT_PRD/on/demandware.static/-/Sites-Givenchy_master/default/dw9eb8ec2d/images/BB05114012542/BB05114012542-01-01.jpg?sw=800
https://www.givenchy.com/dw/image/v2/BBRT_PRD/on/demandware.static/-/Sites-Givenchy_master/default/dw7e12db48/images/BBU017B00B001/BBU017B00B001-01-01.jpg?sw=800
https://www.givenchy.com/dw/image/v2/BBRT_PRD/on/demandware.static/-/Sites-Givenchy_master/default/dw924ff9f6/images/BBU017B00B058/BBU017B00B058-01-01.jpg?sw=800
https://www.givenchy.com/dw/image/v2/BBRT_PRD/on/demandware.static/-/Sites-Givenchy_master/default/dw1974540d/images/BBU017B00B662/BBU017B00B662-01-01.jpg?sw=800
https://www.givenchy.com/dw/image/v2/BBRT_PRD/on/demandware.static/-/Sites-Givenchy_master/default/dw28c6592d/images/BBU017B00B140/BBU017B00B140-01-01.jpg?sw=800

编辑:要获得更高质量的图像,请将?sw=参数更改为更高的分辨率。

例如:

url = 'https://www.givenchy.com/us/en-US/women/bags/all-bags/?start=0&sz=21'
soup = BeautifulSoup(requests.get(url).content, 'html.parser')

for p in soup.select('picture.thumb-img source[media="(min-width: 1800px)"][srcset*="/images/"]'):
    p = p['srcset'].split(',')[-1].split()[0].replace('?sw=800', '?sw=1920')
    print(p)

编辑:要通过URL获得行李包名称,您可以使用:

url = 'https://www.givenchy.com/us/en-US/women/bags/all-bags/?start=0&sz=21'
soup = BeautifulSoup(requests.get(url).content, 'html.parser')

for p in soup.select('picture.thumb-img source[media="(min-width: 1800px)"][srcset*="/images/"]'):
    pic_url = p['srcset'].split(',')[-1].split()[0].replace('?sw=800', '?sw=1920')
    name = p.find_next(class_='product-name').get_text(strip=True)
    print(name, pic_url)

答案 1 :(得分:0)

将鼠标悬停在图像上之后,您可能正在检查元素,这就是为什么它为您提供模型图像的原因。链接是悬停时从更新 (原始袋图像) Givenchy.com/dw/image/v2/BBRT_PRD/on/demandware.static/-/Sites-Givenchy_master/default/dwe86ac579/images/BB500CB0WY001/BB500CB0WY001-01-01.jpg?sw=466

要建模的图像:

givenchy.com/dw/image/v2/BBRT_PRD/on/demandware.static /-/ Sites-Givenchy_master / default / dwd050ac75 /images/BB500CB0WY001/BB500CB0WY001-01-02.jpg? sw = 466

查看粗体字的区别。 尝试向下钻取到Xpath下方,而不将鼠标悬停在包图像上: /html/body/div[1]/main/div[5]/div[2]/div[3]/div/div/ul/li[1]/div/figure/a[1]/picture[1]/source[3]
正如安德烈(Andrej)前面指出的那样,您可以使用BeautifulSoup来实现这一目标。

答案 2 :(得分:0)

要在鼠标悬停图像之前打印图像的 srcset 属性的值,您必须为visibility_of_element_located()引入WebDriverWait,并且可以使用以下任一方法Locator Strategies

  • 使用CSS_SELECTOR

    driver.get('https://www.givenchy.com/us/en-US/women/bags/all-bags/?start=0&sz=21')
    print([my_elem.get_attribute("srcset") for my_elem in WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, "ul.search-result-items.tiles-container.js-slv-product-grid.row figure.product-image picture.thumb-img img")))])
    
  • 使用XPATH

    driver.get('https://www.givenchy.com/us/en-US/women/bags/all-bags/?start=0&sz=21')
    print([my_elem.get_attribute("srcset") for my_elem in WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.XPATH, "//ul[@class='search-result-items tiles-container js-slv-product-grid row']//figure[contains(@class, 'product-image ')]//picture[@class='thumb-img']//img")))])
    
  • 控制台输出:

    ['https://www.givenchy.com/dw/image/v2/BBRT_PRD/on/demandware.static/-/Sites-Givenchy_master/default/dwe86ac579/images/BB500CB0WY001/BB500CB0WY001-01-01.jpg?sw=466', 'https://www.givenchy.com/dw/image/v2/BBRT_PRD/on/demandware.static/-/Sites-Givenchy_master/default/dw8c8efbee/images/BB50F2B0WY001/BB50F2B0WY001-01-01.jpg?sw=466', 'https://www.givenchy.com/dw/image/v2/BBRT_PRD/on/demandware.static/-/Sites-Givenchy_master/default/dw2264f584/LOOKS%20FWxS20/ECOM2.jpg?sw=1000', 'https://www.givenchy.com/dw/image/v2/BBRT_PRD/on/demandware.static/-/Sites-Givenchy_master/default/dw72d49df0/images/BB50F2B0WD001/BB50F2B0WD001-01-01.jpg?sw=466', 'https://www.givenchy.com/dw/image/v2/BBRT_PRD/on/demandware.static/-/Sites-Givenchy_master/default/dw16bf6873/images/BB50F0B0WD001/BB50F0B0WD001-01-01.jpg?sw=466', 'https://www.givenchy.com/dw/image/v2/BBRT_PRD/on/demandware.static/-/Sites-Givenchy_master/default/dwa89db782/images/BB50F0B0WD309/BB50F0B0WD309-01-01.jpg?sw=466', 'https://www.givenchy.com/dw/image/v2/BBRT_PRD/on/demandware.static/-/Sites-Givenchy_master/default/dwb8bb418a/images/BB50F0B0WD051/BB50F0B0WD051-01-01.jpg?sw=466', 'https://www.givenchy.com/dw/image/v2/BBRT_PRD/on/demandware.static/-/Sites-Givenchy_master/default/dweacfc390/images/BB50F2B0WD292/BB50F2B0WD292-01-01.jpg?sw=466', 'https://www.givenchy.com/dw/image/v2/BBRT_PRD/on/demandware.static/-/Sites-Givenchy_master/default/dw51675237/images/BB50F2B0WD051/BB50F2B0WD051-01-01.jpg?sw=466', 'https://www.givenchy.com/dw/image/v2/BBRT_PRD/on/demandware.static/-/Sites-Givenchy_master/default/dw47ef9b42/images/BB50F3B0WD001/BB50F3B0WD001-01-01.jpg?sw=466', 'https://www.givenchy.com/dw/image/v2/BBRT_PRD/on/demandware.static/-/Sites-Givenchy_master/default/dw32b9df63/images/BB50F3B0WD051/BB50F3B0WD051-01-01.jpg?sw=466', 'https://www.givenchy.com/dw/image/v2/BBRT_PRD/on/demandware.static/-/Sites-Givenchy_master/default/dw102294c8/images/BB50F3B0WD496/BB50F3B0WD496-01-01.jpg?sw=466', 'https://www.givenchy.com/dw/image/v2/BBRT_PRD/on/demandware.static/-/Sites-Givenchy_master/default/dw09d01050/images/BB50F3B0WD662/BB50F3B0WD662-01-01.jpg?sw=466', 'https://www.givenchy.com/dw/image/v2/BBRT_PRD/on/demandware.static/-/Sites-Givenchy_master/default/dw442b46a4/images/BB50F2B0WD542/BB50F2B0WD542-01-01.jpg?sw=466', 'https://www.givenchy.com/dw/image/v2/BBRT_PRD/on/demandware.static/-/Sites-Givenchy_master/default/dw1e454ef3/images/BB50F2B0WD309/BB50F2B0WD309-01-01.jpg?sw=466', 'https://www.givenchy.com/dw/image/v2/BBRT_PRD/on/demandware.static/-/Sites-Givenchy_master/default/dw3aa399b9/images/BB05117012542/BB05117012542-01-01.jpg?sw=466', 'https://www.givenchy.com/dw/image/v2/BBRT_PRD/on/demandware.static/-/Sites-Givenchy_master/default/dw9eb8ec2d/images/BB05114012542/BB05114012542-01-01.jpg?sw=466', 'https://www.givenchy.com/dw/image/v2/BBRT_PRD/on/demandware.static/-/Sites-Givenchy_master/default/dw7e12db48/images/BBU017B00B001/BBU017B00B001-01-01.jpg?sw=466', 'https://www.givenchy.com/dw/image/v2/BBRT_PRD/on/demandware.static/-/Sites-Givenchy_master/default/dw924ff9f6/images/BBU017B00B058/BBU017B00B058-01-01.jpg?sw=466', 'https://www.givenchy.com/dw/image/v2/BBRT_PRD/on/demandware.static/-/Sites-Givenchy_master/default/dw1974540d/images/BBU017B00B662/BBU017B00B662-01-01.jpg?sw=466', 'https://www.givenchy.com/dw/image/v2/BBRT_PRD/on/demandware.static/-/Sites-Givenchy_master/default/dw28c6592d/images/BBU017B00B140/BBU017B00B140-01-01.jpg?sw=466']
    
  • 注意:您必须添加以下导入:

    from selenium.webdriver.support.ui import WebDriverWait
    from selenium.webdriver.common.by import By
    from selenium.webdriver.support import expected_conditions as EC