如何使用BeautifulSoup和Selenium识别Python Scraping中的类名或ID

时间:2018-12-05 04:18:48

标签: python selenium selenium-webdriver web-scraping beautifulsoup

我正在构建一个刮板代码,并且已经能够读取该表和所需的信息。问题出在下一页链接上,我尝试使用类名以及svg标记,但是代码随着类名的值更改而中断。

这是页面的链接

Page to scrape

该代码运行以单击以进入下一页的元素css的代码是

driver.find_element_by_css_selector('#root > div > div > main > div.ez6st4XksKUGZfGdvIhjV > section > div:nth-child(1) > div._1c5cPqlj4FoguvpKXSY69p > div > span:nth-child(3) > svg').click()

似乎,当类名称的值更改时,它中断并更改了元素以单击,而我还没有找到一种无需更改即可重复的方法,以便对具有相同结构的多个页面重复。

谢谢

2 个答案:

答案 0 :(得分:2)

您可以单击跨度,也可以使用

from selenium import webdriver
d  = webdriver.Chrome()
url = 'https://super.walmart.com.mx/despensa/enlatados-y-conservas/chiles-enlatados/_/N-10kldy7?%2Fdespensa%2Fenlatados-y-conservas%2Fchiles-enlatados%2F_%2FN-10kldy7%3F%2Fdespensa%2Fenlatados-y-conservas%2Fchiles-enlatados%2F_%2FN-10kldy7%3F%2Fdespensa%2Fenlatados-y-conservas%2Fchiles-enlatados%2F_%2FN-10kldy7%3FNs=product.displayText%7C0&offSet=0&storeId=0000009999&No=40'
d.get(url)
# example number of clicks below
for i in range(2):
    d.find_element_by_xpath("//*[starts-with(@d,'M0')]/parent::*/parent::span").click()

答案 1 :(得分:0)

您可以使用以下行单击“下一步”按钮,而无需引用动态类名称:

driver.find_element_by_xpath('//span[@value]/following-sibling::span/*[name()="svg"]').click()

与CSS选择器相同:

driver.find_element_by_css_selector('span[value] + span > svg')

更新

from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.support.ui import WebDriverWait as wait

while True:
    try:
        wait(driver, 10).until(EC.element_to_be_clickable((By.CSS_SELECTOR, 'span[value] + span > svg'))).click()
    except:
        break