Selenium driver.page_source失败,并在virtualenv中出现错误

时间:2018-07-09 23:30:05

标签: selenium selenium-webdriver

driver.page_source硒代码以前在崩溃的环境中工作,并且仅存在程序的备份文件。该代码是 使用geckodriver linux64 v0.21.0的selenium 3.13.0。在没有环境的情况下,我没有确切的硒版本号 错误。代码在尝试执行driver.page_source时失败。

此尝试的sec.gov网站没有代理。我不知道有没有 代码或Selenium版本有问题。如果您没有看到 下面的代码有问题,您可能建议您使用已知的硒或geckodriver的早期版本,而不会出现此错误。 预先感谢您的帮助。

代码在这里

   import time
    import sys
    from selenium import webdriver

    from selenium.webdriver.common.by import By
    from selenium.webdriver.support.ui import WebDriverWait
    from selenium.webdriver.support import expected_conditions as EC
    from selenium.common.exceptions import TimeoutException
    from selenium.webdriver.common.keys import Keys
    from bs4 import BeautifulSoup


    def lookup_Type(driver, this_url):
        driver.get(this_url)
        box = driver.find_element_by_id("type")
        box.send_keys('10-K')
        box.send_keys(Keys.ENTER)
        return

    def init_driver():
        driver = webdriver.Firefox()
        driver.wait = WebDriverWait(driver, 5)
        return driver

    these_many_stocks = (sys.argv[1]).split(',')

    for this_symbol in these_many_stocks :
        browser = webdriver.Firefox()
        driver = init_driver()
        search_url = "https://www.sec.gov/cgi-bin/browse-edgar?CIK=" +      this_symbol + "&owner=exclude&action=getcompany&Find=Search"
        lookup_Type(driver, search_url)
        time.sleep(10)
        this_page = driver.page_source
        print this_page

虚拟环境中安装的软件包: 套件版本


beautifulsoup4 4.6.0
bs4 0.0.1
lxml 4.2.2
点10.0.1 pkg-resources 0.0.0
prettytable 0.7.2
硒3.13.0 设置工具39.2.0 轮0.31.1

安装的GECKODRIVER: geckodriver-v0.21.0-linux64 放置此webdriver etlibs / selenium / webdriver / firefox / amd64 / geckodriver 之前在etlibs / bin / geckodriver 在这两种配置中,webdriver都会打开页面,但尝试失败:driver.page_source

RUN AND ERROR MESSAGES:
    (etlibs) james@james-Noir-et:~/Documents/et-alt$ python get_xbrl_files.py CSCO
    Traceback (most recent call last): File "get_xbrl_files.py", line 34, in <module>
    this_page = driver.page_source
    File "/home/james/Documents/et-proj/etlibs/local/lib/python2.7/site-packages/selenium/webdriver/remote/webdriver.py", line 678, in page_source
    return self.execute(Command.GET_PAGE_SOURCE)['value']
    File "/home/james/Documents/et-proj/etlibs/local/lib/python2.7/site-packages/selenium/webdriver/remote/webdriver.py", line 318, in execute
    response = self.command_executor.execute(driver_command, params)
    File "/home/james/Documents/et-proj/etlibs/local/lib/python2.7/site-packages/selenium/webdriver/remote/remote_connection.py", line 472, in execute
    return self._request(command_info[0], url, body=data)
    File "/home/james/Documents/et-proj/etlibs/local/lib/python2.7/site-packages/selenium/webdriver/remote/remote_connection.py", line 496, in _request
    resp = self._conn.getresponse()
    File "/usr/lib/python2.7/httplib.py", line 1136, in getresponse
    response.begin()
    File "/usr/lib/python2.7/httplib.py", line 453, in begin
    version, status, reason = self._read_status()
    File "/usr/lib/python2.7/httplib.py", line 417, in _read_status
    raise BadStatusLine(line)
    httplib.BadStatusLine: ''
    (etlibs) james@james-Noir-et:~/Documents/et-alt$ 

0 个答案:

没有答案