我是Bart,是Python的新手,这是我的第一篇文章。 作为威士忌迷,我想刮擦一些商店以给我最近的威士忌优惠,但是,我坚持使用Asda的页面。我在这里浏览了很长时间,但是没有运气。
谢谢。
浏览器正在打开,然后按预期关闭。
下面是我的创作:
Import libraries
# import urllib.request
from bs4 import BeautifulSoup
from selenium import webdriver
import time
# import pandas as pd
# import requests
from selenium.webdriver.firefox.options import Options as FirefoxOptions
# specify url
#url = "https://groceries.asda.com/product/whisky/glenmorangie-the-original-single-malt-scotch-whisky/68303869"
url = "https://groceries.asda.com/search/whisky/1/relevance-desc/so-false/Type%3A3612046177%3AMalt%20Whisky"
# run webdriver with headless option
options = FirefoxOptions()
driver = webdriver.Firefox(options=options)
options.add_argument('--headless')
# get page
driver.get(url)
# execute script to scroll down the page
driver.execute_script('window.scrollTo(0, document.body.scrollHeight);var lenOfPage=document.body.scrollHeight;return lenOfPage;')
# sleep for 30s
time.sleep(30)
# close driver
driver.close()
# find element by xpath
results = driver.find_elements_by_xpath("//*[@id='componentsContainer']//*[@id='listingsContainer']//*[@class='product active']//*[@class='title productTitle']")
"""soup = BeautifulSoup(browser.page_source, 'html.parser')"""
print('Number of results', len(results))
这是输出。
Traceback (most recent call last):
File "D:/PycharmProjects/Giraffe/asda.py", line 29, in <module>
results = driver.find_elements_by_xpath("//*[@id='componentsContainer']//*[@id='listingsContainer']//*[@class='product active']//*[@class='title productTitle']")
File "C:\ProgramData\Anaconda3\lib\site-packages\selenium\webdriver\remote\webdriver.py", line 410, in find_elements_by_xpath
return self.find_elements(by=By.XPATH, value=xpath)
File "C:\ProgramData\Anaconda3\lib\site-packages\selenium\webdriver\remote\webdriver.py", line 1007, in find_elements
'value': value})['value'] or []
File "C:\ProgramData\Anaconda3\lib\site-packages\selenium\webdriver\remote\webdriver.py", line 321, in execute
self.error_handler.check_response(response)
File "C:\ProgramData\Anaconda3\lib\site-packages\selenium\webdriver\remote\errorhandler.py", line 242, in check_response
raise exception_class(message, screen, stacktrace)
selenium.common.exceptions.InvalidSessionIdException: Message: Tried to run command without establishing a connection
Process finished with exit code 1
答案 0 :(得分:0)
这可能不是理想的解决方案。我只是想坚持你已经写过的方式。我知道硬编码延迟也不是一个好方法。就是说,这就是获得结果的方法:
import time
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.action_chains import ActionChains
url = "https://groceries.asda.com/search/whisky"
driver = webdriver.Chrome()
wait = WebDriverWait(driver,10)
driver.get(url)
actions = ActionChains(driver)
for _ in range(3):
actions.send_keys(Keys.END).perform()
time.sleep(3)
results = wait.until(EC.presence_of_all_elements_located((By.XPATH, "//*[@id='searchContainer']//*[contains(@class,'productListing')]//*[contains(@class,'productTitle')]/a")))
print('Number of results', len(results))
driver.quit()
输出:
Number of results 56