我正在尝试使用硒或scrapy来从该特定的URL刮取一些数据。
ive刮了其他页面没有问题,但是,当涉及到这些特定的URL时,我试图刮入列表的信息将返回空。我使用了scrapy,然后继续使用硒,但是结果是相同的。我也在使用pycharm和chromedriver。
尤其要寻找的信息是“ https://shop.freedommobile.ca/devices”上的所有不同手机型号。我打印列表只是为了发现没有任何东西从网站上被刮掉,或者说刮擦成功了,但是什么也没有返回。
当我尝试从此处抓取任何东西时,也会发生同样的情况:
from selenium import webdriver
#open chrome browser and navigate to the webpage
driver = webdriver.Chrome()
driver.get("https://shop.freedommobile.ca/devices")
#extract the names of the phones
phones = driver.find_elements_by_css_selector('.jXeFbj')
#counts phone and its model
for element in range(len(phones)):
numPhone = int(element) + 1
print("phone "+ str(numPhone) +" : " + phones[element].text)
#number of phones in total
sizeOfList = len(phones)
print(sizeOfList)
应该发生的事情是将手机的所有型号名称都放入列表中。
phones = ['iPhone XS Max','iPhone XS','iPhone XR'......
答案 0 :(得分:1)
您的代码还可以,可能是有时通过将请求发送到Fast而得到一个空列表。
您可以使用WebDriverWait解决此问题。
您可以使用以下代码进行一些小的改进:
from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
driver = webdriver.Chrome()
driver.get("https://shop.freedommobile.ca/devices")
# get the list of phones
wait = WebDriverWait(driver, 10)
phones = wait.until(EC.presence_of_all_elements_located((By.CSS_SELECTOR, '.jXeFbj')))
numPhones = len(phones)
#prints the formatted output of each phone
for idx, phone in enumerate(phones):
phone_name = phone.text
print("phone " + str(idx) + " : " + phone_name)
print(numPhones)
输出1:
phone 0 : iPhone XS Max
phone 1 : iPhone XS
phone 2 : iPhone XR
phone 3 : iPhone 8 Plus
phone 4 : iPhone 8
phone 5 : Galaxy S10+
...
输出2:
27
答案 1 :(得分:-1)
要使用 ['iPhone XS Max','iPhone XS','iPhone XR',...] 的形式将电话的所有型号名称刮到列表中Selenium必须为visibility_of_all_elements_located()
引入 WebDriverWait ,并且可以使用以下Locator Strategies中的任何一个:
代码块:
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
options = webdriver.ChromeOptions()
options.add_argument("start-maximized")
# options.add_argument('disable-infobars')
driver = webdriver.Chrome(chrome_options=options, executable_path=r'C:\WebDrivers\chromedriver.exe')
driver.get("https://shop.freedommobile.ca/devices")
#using CSS_SELECTOR
print([my_elem.get_attribute("innerHTML") for my_elem in WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, "h3[class^='deviceListItem__DeviceModel-']")))])
#using XPATH
#print([my_elem.get_attribute("innerHTML") for my_elem in WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.XPATH, "//h3[starts-with(@class, 'deviceListItem__DeviceModel-')]")))])
控制台输出:
['iPhone XS Max', 'iPhone XS', 'iPhone XR', 'iPhone 8 Plus', 'iPhone 8', 'Galaxy S10+', 'Galaxy S10', 'Galaxy S10e', 'Galaxy Tab A 8 LTE', 'Galaxy Note9', 'Galaxy S9', 'Galaxy A8', 'G7 Power', 'Moto E5 Play', 'Pixel 3a', 'Pixel 3', 'Pixel 3 XL', 'Z557', 'G7 ThinQ', 'P30 lite', 'Mate 20 Pro', 'X Power 3', 'G8 ThinQ', 'Q Stylo +', 'GoFLIP', 'Bring Your', 'Own Device']