amazon.in网页中的下一页选项选择

时间:2017-09-20 07:56:13

标签: python-3.x selenium-webdriver web-scraping beautifulsoup amazon

我正在尝试从Amazon.in网页收集产品的ASIN。我的代码将打开一个Web驱动程序并搜索产品名称并导航到产品页面的第一页。它可以收集数据仅适用于第一页,但如何移动到下一页以收集相同的数据。 这是我的代码:

import time
import json
import re
import numpy as np
from bs4 import BeautifulSoup
from selenium import webdriver
import urllib.request
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.common.exceptions import TimeoutException
from selenium.webdriver.common.keys import Keys
import pandas as pd


temp = []


def init_driver():
    driver = webdriver.Chrome(executable_path = "C:\\Users\\Desktop\\chromedriver")
    driver.wait = WebDriverWait(driver, 10)
    return driver


def get_asin(driver):

    driver.get("https://www.amazon.in")
    print ('Getting the URL')
    HTML = driver.page_source
    search_button = driver.find_element_by_id("twotabsearchtextbox")
    search_button.send_keys("Mobiles")
    select_button = driver.find_element_by_class_name("nav-input")
    select_button.click()
    HTML1=driver.page_source
    soup = BeautifulSoup(HTML1, "html.parser")


    styles = soup.find_all('li')
    #print(styles)
    #print(type(styles))
    ASIN=[]
    for link in styles:
        if link.has_attr('data-asin'):
            ASIN.append(link['data-asin'])

    return(ASIN)
    #print(ASIN)


if __name__ == "__main__":
    driver = init_driver()
    ASIN_NO = get_asin(driver)
    #time.sleep(3)
    #print ('opening search page')
    #for i in range(0,len(ASIN_NO)):
        #scrape(driver,ASIN_NO[i])

    print (ASIN_NO)
    time.sleep(5)

我已经尝试了以下两种语法来显示错误:

select_button = driver.find_element_by_id('pagnNextString')
select_button.click()

日志中的异常:

  

WebDriverException:消息:未知错误:元素...在点(778,606)处无法点击。   其他元素将收到点击:

select_button = driver.find_element_by_class_name('srSprite pagnNextArrow')
select_button.click()
  

InvalidSelectorException:消息:无效的选择器:复合类   名字不被允许

请帮助正确的方法。 提前谢谢。

2 个答案:

答案 0 :(得分:0)

我认为你必须最大化窗口,因为元素不可查看,这就是为什么问题元素不可点击出现

driver.maximize_window()

将此xpath用于下一个按钮(用于InvalidSelctor问题)

.//*[@id='nav-search']/form/div[2]/div/input

我对python知之甚少。这是java编码在我的系统中正常工作。将其转换为Python

WebDriver driver=new FirefoxDriver();
driver.get("https://www.amazon.in");
driver.manage().timeouts().implicitlyWait(10, TimeUnit.SECONDS);
WebElement search_txt=driver.findElement(By.xpath("//*[@id='twotabsearchtextbox']"));
search_txt.sendKeys("Mobiles");
driver.manage().window().maximize();
driver.findElement(By.xpath(".//*[@id='nav-search']/form/div[2]/div/input")).click();
WebElement select_btn=driver.findElement(By.xpath("//*[@id='pagnNextString']"));
select_btn.click();

答案 1 :(得分:0)

为了能够点击Next按钮,您可以使用以下代码:

from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

next_button = WebDriverWait(driver, 10).until(EC.visibility_of_element_located((By.ID, "pagnNextString")))
next_button.location_once_scrolled_into_view
next_button.click()

这应该让您等到页面上出现按钮,向下滚动并成功点击