无法使用请求模块从网页中抓取一些图像链接

时间:2021-01-21 15:10:15

标签: python python-3.x web-scraping python-requests

我使用请求和 BeautifulSoup 库创建了一个脚本来解析来自 webpage 的一些图像的链接。检查元素后,当您在搜索栏 (Ctrl + F) 中使用此选择器 [class^='cylindo-viewer-frame'] > img[src*='/frames/'] 时,图像链接可见。这就是 they 在 dom 中的样子。

我知道我可以使用 selenium 获取这些图像链接,但我想坚持使用 requests 模块。我已经多次注意到,总是有可能使用请求模块来获取这样的动态内容。我试过在脚本标签和开发工具中找到这些链接,但没有找到。

32 个预期链接中有两个是:

https://content.cylindo.com/api/v2/4616/products/657285/frames/5/657285.JPG?background=FFFFFF&feature=FABRIC:Q1031&size=1268
https://content.cylindo.com/api/v2/4616/products/657285/frames/7/657285.JPG?background=FFFFFF&feature=FABRIC:Q1031&size=1268

这是我试过的方法:

import requests
from bs4 import BeautifulSoup

link = 'https://www.ethanallen.com/on/demandware.store/Sites-ethanallen-us-Site/en_US/Product-Variation?pid=emersonQS&dwvar_emersonQS_Fabric=Q1031&dwvar_emersonQS_seatingSize=90sofa&step=2'

with requests.Session() as s:
    s.headers['User-Agent'] = 'Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0.4280.141 Safari/537.36'
    r = s.get(link)
    soup = BeautifulSoup(r.text,"lxml")
    for item in soup.select(".cylindo-viewer-container li[class^='cylindo-viewer-frame'] > img[src*='/frames/']"):
        print(item.get("src"))
<块引用>

如何使用请求获取这些图片链接?

1 个答案:

答案 0 :(得分:0)

为什么要使用硒?

网站动态提供内容,什么不是用请求来处理,导致你尝试匹配的信息不在响应中。

看看,没那么难;)

示例

from selenium import webdriver
from bs4 import BeautifulSoup
from time import sleep

driver = webdriver.Chrome(executable_path='C:\Program Files\ChromeDriver\chromedriver.exe')
url = "https://www.ethanallen.com/on/demandware.store/Sites-ethanallen-us-Site/en_US/Product-Variation?pid=emersonQS&dwvar_emersonQS_Fabric=Q1031&dwvar_emersonQS_seatingSize=90sofa&step=2"

driver.get(url)
sleep(2)

soup = BeautifulSoup(driver.page_source, 'lxml')

for item in soup.select(".cylindo-viewer-container li[class^='cylindo-viewer-frame'] > img[src*='/frames/']"):
        print(item.get("src"))
    
driver.close()

输出

https://content.cylindo.com/api/v2/4616/products/657285/frames/3/657285.JPG?background=FFFFFF&amp;feature=FABRIC:Q1031&amp;size=1268
https://content.cylindo.com/api/v2/4616/products/657285/frames/27/657285.JPG?background=FFFFFF&amp;feature=FABRIC:Q1031&amp;size=1268
https://content.cylindo.com/api/v2/4616/products/657285/frames/29/657285.JPG?background=FFFFFF&amp;feature=FABRIC:Q1031&amp;size=1268
https://content.cylindo.com/api/v2/4616/products/657285/frames/11/657285.JPG?background=FFFFFF&amp;feature=FABRIC:Q1031&amp;size=1268
https://content.cylindo.com/api/v2/4616/products/657285/frames/31/657285.JPG?background=FFFFFF&amp;feature=FABRIC:Q1031&amp;size=1268
https://content.cylindo.com/api/v2/4616/products/657285/frames/5/657285.JPG?background=FFFFFF&amp;feature=FABRIC:Q1031&amp;size=1268
...