我正在使用ssh连接到服务器,我无法使用selenium3.4和firefox56找到窗口。 找不到解决方案,发现它主要是硒的IE漏洞 码: 我
mport bs4 as bs
from bs4 import BeautifulSoup
import urllib.request
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
import time
import re
from random import randint
import pandas as pd
from selenium.webdriver.common.desired_capabilities import DesiredCapabilities
from pyvirtualdisplay import Display
def get_soup(url):
sauce = urllib.request.urlopen(url)
return BeautifulSoup(sauce, 'lxml')
def get_driver_soup(url):
# driver = webdriver.Firefox(executable_path='/usr/bin/geckodriver')
display = Display(visible=0, size=(800, 600))
display.start()
driver = webdriver.Firefox('/var/gecodriver19-64')
driver.get(url)
try:
element = WebDriverWait(driver, 20).until(
EC.presence_of_element_located((By.CLASS_NAME, "product-image-wrapper"))
)
finally:
soup = BeautifulSoup(driver.page_source, 'lxml')
time.sleep(randint(30, 70))
driver.quit()
return soup
完成追溯::
Traceback (most recent call last):
File "jomashop.py", line 86, in <module>
soup = get_driver_soup(companies_list[x] + page_suffix)
File "jomashop.py", line 32, in get_driver_soup
soup = BeautifulSoup(driver.page_source, 'lxml')
File "/usr/local/lib/python3.5/dist-packages/selenium/webdriver/remote/webdriver.py", line 587, in page_source
return self.execute(Command.GET_PAGE_SOURCE)['value']
File "/usr/local/lib/python3.5/dist-packages/selenium/webdriver/remote/webdriver.py", line 311, in execute
self.error_handler.check_response(response)
File "/usr/local/lib/python3.5/dist-packages/selenium/webdriver/remote/errorhandler.py", line 237, in check_response
raise exception_class(message, screen, stacktrace)
selenium.common.exceptions.NoSuchWindowException: Message: Unable to locate window
答案 0 :(得分:0)
当您尝试使用Selenium Driver返回源HTML时,它可能会超时。
而不是:
finally:
soup = BeautifulSoup(driver.page_source, 'lxml')
请尝试使用请求:
import requests
finally:
r = requests.get(url)
html_bytes = r.text
soup = BeautifulSoup(html_bytes, 'lxml')
这应该只提取html
答案 1 :(得分:0)
检查你的geckodriver.log文件(应该与python文件在同一目录中)
如果说
Error: GDK_BACKEND does not match available displays
然后安装pyvirtualdisplay:
pip install pyvirtualdisplay selenium
您可能也需要xvfb:
sudo apt-get install xvfb
然后尝试添加此代码:
from pyvirtualdisplay import Display
display = Display(visible=0, size=(800, 600))
display.start()
完整示例:
from pyvirtualdisplay import Display
from selenium import webdriver
display = Display(visible=0, size=(800, 600))
display.start()
driver = webdriver.Firefox()
driver.get('http://www.python.org')
driver.close()
答案 2 :(得分:0)
错误说明了一切:
NoSuchWindowException: Message: Unable to locate window
更改行:
driver = webdriver.Firefox('/var/gecodriver19-64')
致:
driver = webdriver.Firefox(executable_path='/var/gecodriver19-64/geckodriver')