报废硒

时间:2020-02-26 11:05:33

标签: python selenium selenium-webdriver

我正在尝试使用硒webdriver在class="a-size-based-plus a-color-base"中提取以下HTML中的文本。

Scraping the text inside the blue line

我的代码结构如下:

from selenium import webdriver
from selenium.webdriver import Chrome
from selenium.webdriver.common.by import By
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.support.expected_conditions import presence_of_element_located

import os
import re  # regular expressions, are imported from python directly
import time
import numpy as np
import pandas as pd
from difflib import SequenceMatcher
BASE_DIR = os.path.dirname(os.path.abspath(__file__))

------ HERE是一些不相关的代码-----

# Find Data
    i = 0
    driver.execute_script("window.scrollTo(0,document.body.scrollHeight)")
    wait = WebDriverWait(driver, 20)
    wait.until(EC.element_to_be_clickable(
        (By.CLASS_NAME, 'xtaqv-root')))
    wait.until(EC.element_to_be_clickable((By.CLASS_NAME, 'extension-rank')))
    wait.until(EC.element_to_be_clickable((By.CSS_SELECTOR, '[data-src="price"]')))
    time.sleep(5)

    for element in driver.find_elements_by_class_name('xtaqv-root'):   
        # Ratio of similarity
        try:
            item_name = element.find_element_by_tag_name("h2").text
            ratio = SequenceMatcher(None, item_name, key).ratio()
        except:
            item_name = np.nan
            ratio = 0
            pass
        try:
            link = element.find_element_by_css_selector('[data-src="price"]')
            href = link.get_attribute('href')
        except:         
            href = np.nan
        try:
            brand = element.find_element_by_css_selector('.a-size-based-plus.a-color-base')
            brand = brand.text
        except:         
            brand = np.nan  

代码中的最后一个try-except最重要。

1 个答案:

答案 0 :(得分:4)

通过查看HTML,我在您的定位器中看到以下错误:

brand = element.find_element_by_css_selector('.a-size-based-plus.a-color-base')

应为size-base而不是size-based,请尝试以下操作:

brand = element.find_element_by_css_selector('.a-size-base-plus.a-color-base')

希望,这会有所帮助。