试图点击亚马逊畅销书排名(Python)

时间:2020-12-29 10:33:46

标签: python selenium xpath css-selectors amazon

您好,我正在尝试点击这些链接,但是当我尝试点击时

driver.find_element_by_xpath('//*[@id="productDetails_detailBullets_sections1"]/tbody/tr[6]/td/span/span[2]/a').click()

它的工作,但问题是每个项目都有不同的路径和变化,它对某些项目不起作用

网址:https://www.amazon.com/MICHELANGELO-Piece-Rainbow-Kitchen-Knife/dp/B074T6C4YS/ref=zg_bs_289857_1?_encoding=UTF8&psc=1&refRID=K5GAX1GF2SDZMN3NS403>

enter image description here

2 个答案:

答案 0 :(得分:0)

这很简单,即使你没有指定你想要哪个链接,只是从表中所有不同的链接将你转移到表中。

您需要使用自定义的 xpath,例如

//*[@id="productDetails_detailBullets_sections1"]/tbody/tr[6]/td/span/span['+i+']/a'

我将在 for 循环中成为您的迭代器。要获得我的价值,请使用类似

driver.find_elements_by_xpath('//*[@id="productDetails_detailBullets_sections1"]/tbody/tr[6]/td/span/span').size();

答案 1 :(得分:0)

Amazon webpage 有 3 个Best Sellers Rank 条目。一种有效的方法是收集所有三 (3) 个畅销书val wsClien = client.newWebSocket(request, listener) ,将它们存储在一个列表中并在单独的标签中打开以进行抓取。要构建列表,您必须为 href 引入 WebDriverWait,并且您可以使用以下任一 Locator Strategies

  • 使用 visibility_of_all_elements_located()

    CSS_SELECTOR
  • 在一行中使用 driver.get('https://www.amazon.com/dp/B074T6C4YS') print([my_elem.get_attribute("href") for my_elem in WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, "table#productDetails_detailBullets_sections1 td>span>span a")))])

    CSS_SELECTOR
  • 控制台输出:

    driver.get('https://www.amazon.com/dp/B074T6C4YS')
    print([my_elem.get_attribute("href") for my_elem in WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.XPATH, "//table[@id='productDetails_detailBullets_sections1']//td/span/span//a")))])
    
  • 注意:您必须添加以下导入:

    ['https://www.amazon.com/gp/bestsellers/kitchen/ref=pd_zg_ts_kitchen', 'https://www.amazon.com/gp/bestsellers/kitchen/289857/ref=pd_zg_hrsr_kitchen', 'https://www.amazon.com/gp/bestsellers/kitchen/289862/ref=pd_zg_hrsr_kitchen']