Question

我一直试图刮www.zomato.com一个多星期，现在我已经通过网络搜索了我的问题，但我找不到合适的解决方案。所以我在这里发布了我的问题。

这是我的webscraper代码。

from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.support.ui import Select
from selenium.webdriver.support.ui import WebDriverWait
from selenium.common.exceptions import TimeoutException
from selenium.webdriver.support import expected_conditions as EC
from selenium.common.exceptions import NoSuchElementException
from selenium.common.exceptions import NoAlertPresentException
import sys
import lxml
import unittest, time, re

class Sel(unittest.TestCase):
    def setUp(self):
        self.driver = webdriver.PhantomJS(executable_path='\phantomjs.exe')#phantom js
        self.driver.implicitly_wait(30)
        self.base_url = "https://www.zomato.com"
        self.verificationErrors = []
        self.accept_next_alert = True
    def test_sel(self):
        driver = self.driver
        delay = 3
        driver.get(self.base_url + "hyderabad")
        driver.find_element_by_link_text("All").click()
        for i in range(1,100):
            self.driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
            time.sleep(4)
        html_source = driver.page_source
        data = html_source.encode('utf-8')


if __name__ == "__main__":

当我在python 3.4中运行时，即目录/ py -3.4 selenium.py 我收到这个错误 selenium-python-phantomJS-SSL。
任何人都可以帮我解决这个问题吗？最诚挚的问候。

Answer 1

您需要在请求中添加相应的接受编码标头。

'accept-encoding'：'gzip，deflate，sdch，br'

Answer 2

首先，您发布错误的屏幕截图 NOT 来自您发布的代码。您的代码示例显示您正在调用webdriver.PhantomJS，但屏幕截图清楚地显示您在调用webdriver.Firefox时收到错误。

此外，屏幕截图中的错误消息可以准确地告诉您问题是什么以及解决方法：＆＃34; geckodriver可执行文件需要处于PATH＆＃34;。

将Firefox与selenium一起使用。你需要安装geckodriver并在你的PATH上使用它。 geckodriver（如chromedriver）是一个外部组件，不附带Firefox或Selenium ...它必须单独安装。

您可以在此处下载geckodriver：https://github.com/mozilla/geckodriver/releases

如何使用python和selenium webdriver刮取https网站数据

2 个答案: