最近我一直在学习Python抓取。部分源代码:
div class='search__grid'>
<div class="photos">
<div class='photos__column'>
<div class='hide-featured-badge hide-favorite-badge'>
<article class='photo-item photo-item--overlay'>
<a class="js-photo-link photo-item__link" href="/photo/person-holding-black-ceramic-pig-coin-bank-3943723/">
<img srcset="https://images.pexels.com/photos/3943723/pexels-photo-3943723.jpeg?auto=compress&cs=tinysrgb&dpr=1&w=500 1x, https://images.pexels.com/photos/3943723/pexels-photo-3943723.jpeg?auto=compress&cs=tinysrgb&dpr=2&w=500 2x"
class="photo-item__img" alt="Person Holding Black Ceramic Pig Coin Bank" data-image-width="3811" data-image-height="5716"
data-big-src="https://images.pexels.com/photos/3943723/pexels-photo-3943723.jpeg?auto=compress&cs=tinysrgb&h=750&w=1260" />
我想收集img.srcset.data-large-src
中的图像链接。但是,我无法通过使用以下命令来找到div
元素:
find_element_by_class_name('search__grid')
也不by_tag_name(div.search_grid)
也不by_css_selector('divsearch_grid')
。例如,当我如下使用by_class_name
时发生了错误...
no such element: Unable to locate element: {"method":"css selector","selector":".search__grid"}
我什至没有使用css选择器...!
另一个问题是如何从data-big-src
属性中仅提取srcset
链接?
我期待您的意见。预先感谢。
答案 0 :(得分:0)
首先获取img元素
imgElement = driver.find_element_by_class_name('photo-item__img');
或通过XPAHT
imgElement = driver.find_element_by_xpath("//img[contains(@class,'photo-item__img')]")
第二秒,转到属性data-big-src
textElement = imgElement.get_attribute('data-big-src');
答案 1 :(得分:0)
尝试使用以下类似方法获取srcset值。
imgElement = WebDriverWait(driver, 10).until(EC.presence_of_element_located((By.XPATH, "//img[class='photo-item__img']")))
print(imgElement.get_attribute('srcset'))
导入
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
答案 2 :(得分:0)
可以在srcset
以及data-big-src
属性中找到图像链接,以打印图像链接的值,您需要引入WebDriverWait visibility_of_element_located()
,则可以使用以下任一Locator Strategies:
使用CSS_SELECTOR
和 srcset 属性:
print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.CSS_SELECTOR, "div.search__grid > div.photos > div.photos__column a.js-photo-link.photo-item__link[href^='/photo/person-holding-black-ceramic-pig-coin-bank'] > img"))).get_attribute("data-clipboard-text"))
使用XPATH
和 data-big-src 属性:
print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//div[@class='search__grid']/div[@class='photos']/div[@class='photos__column']//a[@class='js-photo-link photo-item__link' and starts-with(@href, '/photo/person-holding-black-ceramic-pig-coin-bank')]/img"))).get_attribute("data-big-src"))
注意:您必须添加以下导入:
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
您可以在NoSuchElementException上找到一些相关的讨论: