作为网络抓取任务的一部分,我必须从AXS.com网站上抓取所有活动详细信息。我尝试将Chrome Web驱动程序与Python + Selenium结合使用。
我可以通过使用driver.find_element_by_class_name()
来获取值,例如driver.find_element_by_class_name("headliner").text
。
但这仅获得第一项。在使用driver.find_elements(By.XPATH,"//div[@class='results-table results-table--events']")
后尝试进行迭代时,我遇到了麻烦。
from bs4 import BeautifulSoup
from selenium import webdriver
import time
driver = webdriver.Chrome('/home/.../chromedriver_linux64/chromedriver')
driver.get("https://www.axs.com/browse/music/alternative-punk")
driver.implicitly_wait(10)
allevent_details = driver.find_elements(By.XPATH,"//div[@class='results-table results-table--events']")
for i in allevent_details:
print(i.find_element_by_class_name("headliner").text)
错误
NoSuchElementException: no such element: Unable to locate element: {"method":"class name","selector":"headliner"}
(Session info: chrome=74.0.3729.169)
(Driver info: chromedriver=74.0.3729.6 (255758eccf3d244491b8a1317aa76e1ce10d57e9-refs/branch-heads/3729@{#29}),platform=Linux 4.15.0-50-generic x86_64)
预期:
答案 0 :(得分:0)
更改逻辑,如下所示。
from bs4 import BeautifulSoup
from selenium import webdriver
import time
driver = webdriver.Chrome('/home/.../chromedriver_linux64/chromedriver')
driver.get("https://www.axs.com/browse/music/alternative-punk")
driver.implicitly_wait(10)
allevent_details = driver.find_elements(By.XPATH,"//div[@class='results-table results-table--events']//div[@class='headliner']")
for i in allevent_details:
print(i.text)
答案 1 :(得分:0)
尝试以下任何定位器。
allevent_details = driver.find_elements(By.XPATH,"//div[@class='results-table results-table--events']")
for i in allevent_details:
print(i.find_element_by_xpath(".//div[@class='headliner']").text)
for item in driver.find_elements_by_css_selector('.headliner'):
print(item.text)
答案 2 :(得分:0)
要从webpage中提取所有事件标题,您需要为visibility_of_all_elements_located()
引入 WebDriverWait ,并且可以使用以下任一方法Locator Strategies:
使用CSS_SELECTOR
:
print([my_elem.text for my_elem in WebDriverWait(driver, 5).until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, "div.headliner")))])
使用XPATH
:
print([my_elem.text for my_elem in WebDriverWait(driver, 5).until(EC.visibility_of_all_elements_located((By.XPATH, "//div[@class='headliner']")))])
控制台输出:
['Inner Wave', 'BLOXX, Hembree and Warbly Jets', 'Frenship', 'LANY', 'together PANGEA & Vundabar', 'Night Beats', 'New Politics', 'The Technicolors', 'Davila 666', 'Vansire + BOYO', 'The Starting Line', 'Katzù Oso', 'The Raconteurs', 'Cayucas', 'ALT 98.7 Summer Camp']
注意:您必须添加以下导入:
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC