当我使用正确的CSS选择器时,我的网络爬虫返回一个空列表

时间:2019-05-29 19:36:55

标签: python-3.x selenium web-scraping scrapy css-selectors

我正在尝试使用硒或scrapy来从该特定的URL刮取一些数据。

ive刮了其他页面没有问题,但是,当涉及到这些特定的URL时,我试图刮入列表的信息将返回空。我使用了scrapy,然后继续使用硒,但是结果是相同的。我也在使用pycharm和chromedriver。

尤其要寻找的信息是“ https://shop.freedommobile.ca/devices”上的所有不同手机型号。我打印列表只是为了发现没有任何东西从网站上被刮掉,或者说刮擦成功了,但是什么也没有返回。

当我尝试从此处抓取任何东西时,也会发生同样的情况:

https://shop.freedommobile.ca/devices/Apple/iPhone_XS_Max?sku=190198786074&planSku=Freedom%20Big%20Gig%20%2B%20Talk%2015GB

from selenium import webdriver

#open chrome browser and navigate to the webpage
driver = webdriver.Chrome()
driver.get("https://shop.freedommobile.ca/devices")

#extract the names of the phones
phones = driver.find_elements_by_css_selector('.jXeFbj')

#counts phone and its model
for element in range(len(phones)):
    numPhone = int(element) + 1
    print("phone "+ str(numPhone) +" : " + phones[element].text)


#number of phones in total
sizeOfList = len(phones)
print(sizeOfList)

应该发生的事情是将手机的所有型号名称都放入列表中。

phones = ['iPhone XS Max','iPhone XS','iPhone XR'......

2 个答案:

答案 0 :(得分:1)

您的代码还可以,可能是有时通过将请求发送到Fast而得到一个空列表。

您可以使用WebDriverWait解决此问题。

您可以使用以下代码进行一些小的改进:

from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
driver = webdriver.Chrome()
driver.get("https://shop.freedommobile.ca/devices")

# get the list of phones
wait = WebDriverWait(driver, 10)
phones = wait.until(EC.presence_of_all_elements_located((By.CSS_SELECTOR, '.jXeFbj')))
numPhones = len(phones)

#prints the formatted output of each phone
for idx, phone in enumerate(phones):
    phone_name = phone.text
    print("phone " + str(idx) + " : " + phone_name)

print(numPhones)

输出1:

phone 0 : iPhone XS Max
phone 1 : iPhone XS
phone 2 : iPhone XR
phone 3 : iPhone 8 Plus
phone 4 : iPhone 8
phone 5 : Galaxy S10+
...

输出2:

27

答案 1 :(得分:-1)

要使用 ['iPhone XS Max','iPhone XS','iPhone XR',...] 的形式将电话的所有型号名称刮到列表中Selenium必须为visibility_of_all_elements_located()引入 WebDriverWait ,并且可以使用以下Locator Strategies中的任何一个:

  • 代码块:

    from selenium import webdriver
    from selenium.webdriver.common.by import By
    from selenium.webdriver.support.ui import WebDriverWait
    from selenium.webdriver.support import expected_conditions as EC
    
    options = webdriver.ChromeOptions() 
    options.add_argument("start-maximized")
    # options.add_argument('disable-infobars')
    driver = webdriver.Chrome(chrome_options=options, executable_path=r'C:\WebDrivers\chromedriver.exe')
    driver.get("https://shop.freedommobile.ca/devices")
    #using CSS_SELECTOR
    print([my_elem.get_attribute("innerHTML") for my_elem in WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, "h3[class^='deviceListItem__DeviceModel-']")))])
    #using XPATH
    #print([my_elem.get_attribute("innerHTML") for my_elem in WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.XPATH, "//h3[starts-with(@class, 'deviceListItem__DeviceModel-')]")))])
    
  • 控制台输出:

    ['iPhone XS Max', 'iPhone XS', 'iPhone XR', 'iPhone 8 Plus', 'iPhone 8', 'Galaxy S10+', 'Galaxy S10', 'Galaxy S10e', 'Galaxy Tab A 8 LTE', 'Galaxy Note9', 'Galaxy S9', 'Galaxy A8', 'G7 Power', 'Moto E5 Play', 'Pixel 3a', 'Pixel 3', 'Pixel 3 XL', 'Z557', 'G7 ThinQ', 'P30 lite', 'Mate 20 Pro', 'X Power 3', 'G8 ThinQ', 'Q Stylo +', 'GoFLIP', 'Bring Your', 'Own Device']