Find_by_xpath结果错误

时间:2019-07-04 21:12:49

标签: web-scraping python-3.7

我是Bart,是Python的新手,这是我的第一篇文章。 作为威士忌迷,我想刮擦一些商店以给我最近的威士忌优惠,但是,我坚持使用Asda的页面。我在这里浏览了很长时间,但是没有运气。

谢谢。

浏览器正在打开,然后按预期关闭。

下面是我的创作:

Import libraries
# import urllib.request
from bs4 import BeautifulSoup
from selenium import webdriver
import time
# import pandas as pd
# import requests
from selenium.webdriver.firefox.options import Options as FirefoxOptions

# specify url
#url = "https://groceries.asda.com/product/whisky/glenmorangie-the-original-single-malt-scotch-whisky/68303869"
url = "https://groceries.asda.com/search/whisky/1/relevance-desc/so-false/Type%3A3612046177%3AMalt%20Whisky"

# run webdriver with headless option
options = FirefoxOptions()
driver = webdriver.Firefox(options=options)
options.add_argument('--headless')
# get page
driver.get(url)
# execute script to scroll down the page
driver.execute_script('window.scrollTo(0, document.body.scrollHeight);var lenOfPage=document.body.scrollHeight;return lenOfPage;')
# sleep for 30s
time.sleep(30)
# close driver
driver.close()

# find element by xpath
results = driver.find_elements_by_xpath("//*[@id='componentsContainer']//*[@id='listingsContainer']//*[@class='product active']//*[@class='title productTitle']")
"""soup = BeautifulSoup(browser.page_source, 'html.parser')"""

print('Number of results', len(results))

这是输出。

Traceback (most recent call last):
  File "D:/PycharmProjects/Giraffe/asda.py", line 29, in <module>
    results = driver.find_elements_by_xpath("//*[@id='componentsContainer']//*[@id='listingsContainer']//*[@class='product active']//*[@class='title productTitle']")
  File "C:\ProgramData\Anaconda3\lib\site-packages\selenium\webdriver\remote\webdriver.py", line 410, in find_elements_by_xpath
    return self.find_elements(by=By.XPATH, value=xpath)
  File "C:\ProgramData\Anaconda3\lib\site-packages\selenium\webdriver\remote\webdriver.py", line 1007, in find_elements
    'value': value})['value'] or []
  File "C:\ProgramData\Anaconda3\lib\site-packages\selenium\webdriver\remote\webdriver.py", line 321, in execute
    self.error_handler.check_response(response)
  File "C:\ProgramData\Anaconda3\lib\site-packages\selenium\webdriver\remote\errorhandler.py", line 242, in check_response
    raise exception_class(message, screen, stacktrace)
selenium.common.exceptions.InvalidSessionIdException: Message: Tried to run command without establishing a connection


Process finished with exit code 1

1 个答案:

答案 0 :(得分:0)

这可能不是理想的解决方案。我只是想坚持你已经写过的方式。我知道硬编码延迟也不是一个好方法。就是说,这就是获得结果的方法:

import time
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.action_chains import ActionChains

url = "https://groceries.asda.com/search/whisky"

driver = webdriver.Chrome()
wait = WebDriverWait(driver,10)

driver.get(url)
actions = ActionChains(driver)
for _ in range(3):
    actions.send_keys(Keys.END).perform()
    time.sleep(3)

results = wait.until(EC.presence_of_all_elements_located((By.XPATH, "//*[@id='searchContainer']//*[contains(@class,'productListing')]//*[contains(@class,'productTitle')]/a")))
print('Number of results', len(results))
driver.quit()

输出:

Number of results 56