我需要提取以下网站的面包屑:https://www.woolworths.com.au/Shop/Browse/drinks/cordials-juices-iced-teas/iced-teas
我试图检查元素并复制xpath,但它没有提取
from selenium import webdriver
driver = webdriver.Firefox()
driver.get('https://www.woolworths.com.au/Shop/Browse/drinks/cordials-juices-iced-teas/iced-teas')
driver.find_elements_by_xpath('//*[@id="center-panel"]/div/wow-tile-list-with-content/ng-transclude/wow-browse-tile-list/wow-tile-list/div/div[1]/div[1]/wow-breadcrumbs/div/ul/li[4]/span/span')
driver.find_element_by_css_selector('#center-panel > div > wow-tile-list-with-content > ng-transclude > wow-browse-tile-list > wow-tile-list > div > div.tileList > div.tileList-headerContainer > wow-breadcrumbs > div > ul > li:nth-child(4) > span > span')
我该如何进行?
答案 0 :(得分:1)
您要抓取的页面是用Angular编写的,这意味着大多数DOM element
是由JavaScript AJAX代码动态加载的,并且一旦页面加载就不存在。 (driver.get
函数返回)
您应该使用waits until
函数来查找此类元素。
以下是使用您提供的XPATH的有效示例:
driver.get('https://www.woolworths.com.au/Shop/Browse/drinks/cordials-juices-iced-teas/iced-teas')
try:
element = WebDriverWait(driver, 1).until(
EC.presence_of_element_located((By.XPATH, '//*[@id="center-panel"]/div/wow-tile-list-with-content/ng-transclude/wow-browse-tile-list/wow-tile-list/div/div[1]/div[1]/wow-breadcrumbs/div/ul/li[4]/span/span'))
)
print(element.text) ' this outputs Iced Teas
except TimeoutException:
print("Timeout")
答案 1 :(得分:1)
要打印网站的面包屑:https://www.woolworths.com.au/Shop/Browse/drinks/cordials-juices-iced-teas/iced-teas,必须诱使 WebDriverWait 成为所需的visibility_of_element_located()
,并且可以使用以下任一Locator Strategies :
使用CSS_SELECTOR
和get_attribute()
方法:
print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.CSS_SELECTOR, "ul.breadcrumbs-linkList li:nth-child(4) span span"))).get_attribute("innerHTML"))
使用XPATH
和text
属性:
print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//ul[@class='breadcrumbs-linkList']//following-sibling::li[4]//span//span"))).text)
注意:您必须添加以下导入:
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
根据文档:
get_attribute()
方法Gets the given attribute or property of the element.
text
属性返回The text of the element.
答案 2 :(得分:0)
下面一个适用于我的验证
//*[span='first text' and span='Search results for "second text"']