执行循环后,Selenium超时异常

时间:2016-11-14 15:46:42

标签: python selenium web-scraping

我遇到了一个问题,我不明白为什么我的代码以这种方式工作。本质上我试图运行for循环x次,但我的代码一直说TimeoutException

Traceback (most recent call last):
  File "/Users/Ryan/Desktop/selftest1.py", line 33, in <module>
    EC.presence_of_element_located((By.ID, "ctl00_lblStockname"))
  File "/Library/Python/2.7/site-packages/selenium/webdriver/support/wait.py", line 80, in until
    raise TimeoutException(message, screen, stacktrace)
TimeoutException: Message: 

我的代码正文是:

for x in range(1,10):

    baseurl = 'http://www.hkexnews.hk'
    url = 'http://www.hkexnews.hk/listedco/listconews/advancedsearch/search_active_main.aspx'

    driver = webdriver.Firefox()
    driver.get(url)
    driver.find_element_by_id("ctl00_txt_stock_code").clear()
    driver.find_element_by_id("ctl00_txt_stock_code").send_keys(x)
    driver.find_element_by_id("ctl00_rbAfter2006").click()
    Select(driver.find_element_by_id("ctl00_sel_DateOfReleaseFrom_y")).select_by_visible_text("1999")
    Select(driver.find_element_by_id("ctl00_sel_tier_1")).select_by_visible_text("Financial Statements/ESG Information")
    Select(driver.find_element_by_id("ctl00_sel_tier_2")).select_by_visible_text("Annual Report")
    driver.find_element_by_css_selector("label > a > img").click()

    match = re.compile('\.(html|pdf)')
    try:
        element = WebDriverWait(driver, 1).until(
            EC.presence_of_element_located((By.ID, "ctl00_lblStockname"))
        )
    finally:
        f = driver.page_source
        soup = BeautifulSoup(f,'html.parser')
        for link in soup.findAll('a'):
            try:
                href = link['href']
                if re.search(match, href):
                    file = open("newfile.txt", "a")
                    file.write(baseurl+href+'\n')
                    file.close
                    print ('finished write')
                    print baseurl+href
            except KeyError:
                pass
    driver.quit()

据我所知,由于第一次尝试,引发了超时异常。但是当它最终命中时不应该停止循环吗?此外,我尝试在尝试&#39;之后添加例外。在最后&#39;之前对于超时错误,它给了我错误

error: [Errno 61] Connection refused

我老老实实地失去了如何解决这个问题或者首先导致问题的原因。

编辑:

我在重置所有内容后添加了一个异常块,现在看起来工作正常。如:

try: 
   ...
except TimeoutException:
        driver.quit
finally:
   ...

如果有人想知道解决方案,那么仅供将来参考。

2 个答案:

答案 0 :(得分:0)

您必须为浏览器http://selenium-python.readthedocs.io/waits.html

设置implicitrly_wait(时间)
driver.implicitly_wait(10)

是浏览器等待查找网络元素的最长时间

pd:你的try-finally不会捕获异常,请使用try-except-finally

答案 1 :(得分:0)

我已更新代码,请尝试此操作

from selenium import webdriver
from selenium.webdriver.support.ui import Select
import re
from bs4 import BeautifulSoup
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.support.wait import WebDriverWait
from selenium.webdriver.common.by import By


for x in range(1,10):

    baseurl = 'http://www.hkexnews.hk'
    url = 'http://www.hkexnews.hk/listedco/listconews/advancedsearch/search_active_main.aspx'

    driver = webdriver.Chrome()
    driver.maximize_window()
    driver.get(url)
    driver.find_element_by_id("ctl00_txt_stock_code").clear()
    driver.find_element_by_id("ctl00_txt_stock_code").send_keys(x)
    driver.find_element_by_id("ctl00_rbAfter2006").click()
    Select(driver.find_element_by_id("ctl00_sel_DateOfReleaseFrom_y")).select_by_visible_text("1999")
    Select(driver.find_element_by_id("ctl00_sel_tier_1")).select_by_visible_text("Financial Statements/ESG Information")
    Select(driver.find_element_by_id("ctl00_sel_tier_2")).select_by_visible_text("Annual Report")
    driver.find_element_by_css_selector("label > a > img").click()

    match = re.compile('\.(html|pdf)')

    wait = WebDriverWait(driver, 10)
    wait.until(EC.presence_of_element_located(
            (By.XPATH, '//*[@id="ctl00_lblStockName"]')))

    f = driver.page_source
    soup = BeautifulSoup(f,'html.parser')
    for link in soup.findAll('a'):
        try:
            href = link['href']
            if re.search(match, href):
                file = open("newfile.txt", "a")
                file.write(baseurl+href+'\n')
                file.close()
                print ('finished write')
                print baseurl+href
        except KeyError:
            pass
    driver.quit()

提供此输出:

C:\Python27\python.exe 
C:/XXXX/kimpster.py
finished write
http://www.hkexnews.hk/listedco/listconews/SEHK/2016/0412/LTN20160412398.pdf
finished write
http://www.hkexnews.hk/listedco/listconews/SEHK/2015/0429/LTN201504291354.pdf
finished write
http://www.hkexnews.hk/listedco/listconews/SEHK/2014/0407/LTN20140407336.pdf
finished write
http://www.hkexnews.hk/listedco/listconews/SEHK/2013/0408/LTN20130408921.pdf
finished write
http://www.hkexnews.hk/listedco/listconews/SEHK/2012/0410/LTN20120410623.pdf
finished write
http://www.hkexnews.hk/listedco/listconews/SEHK/2011/0411/LTN20110411707.pdf
finished write
http://www.hkexnews.hk/listedco/listconews/SEHK/2010/0422/LTN20100422489.pdf
finished write
http://www.hkexnews.hk/listedco/listconews/SEHK/2008/0423/LTN20080423279.pdf
finished write
http://www.hkexnews.hk/listedco/listconews/SEHK/2009/0319/LTN20090319103.pdf
finished write
http://www.hkexnews.hk/listedco/listconews/SEHK/2009/0319/LTN20090319097.pdf
finished write
http://www.hkexnews.hk/listedco/listconews/SEHK/2016/0421/LTN20160421233.pdf
finished write
http://www.hkexnews.hk/listedco/listconews/SEHK/2015/0422/LTN20150422417.pdf
finished write
http://www.hkexnews.hk/listedco/listconews/SEHK/2014/0423/LTN20140423340.pdf
finished write
http://www.hkexnews.hk/listedco/listconews/SEHK/2013/0422/LTN20130422293.pdf
finished write
http://www.hkexnews.hk/listedco/listconews/SEHK/2012/0425/LTN20120425287.pdf
finished write
http://www.hkexnews.hk/listedco/listconews/SEHK/2011/0421/LTN20110421583.pdf
finished write
http://www.hkexnews.hk/listedco/listconews/SEHK/2010/0423/LTN20100423265.pdf
finished write
http://www.hkexnews.hk/listedco/listconews/SEHK/2009/0420/LTN20090420355.pdf
finished write
http://www.hkexnews.hk/listedco/listconews/SEHK/2008/0423/LTN20080423322.pdf
finished write
http://www.hkexnews.hk/listedco/listconews/SEHK/2016/0407/LTN20160407581.pdf
finished write
http://www.hkexnews.hk/listedco/listconews/SEHK/2015/0413/LTN20150413273.pdf
finished write
http://www.hkexnews.hk/listedco/listconews/SEHK/2014/0428/LTN20140428711.pdf
finished write
http://www.hkexnews.hk/listedco/listconews/SEHK/2013/0429/LTN20130429395.pdf
finished write
http://www.hkexnews.hk/listedco/listconews/SEHK/2012/0426/LTN20120426622.pdf
finished write
http://www.hkexnews.hk/listedco/listconews/SEHK/2011/0426/LTN20110426450.pdf
finished write
http://www.hkexnews.hk/listedco/listconews/SEHK/2010/0423/LTN20100423393.pdf
finished write
http://www.hkexnews.hk/listedco/listconews/SEHK/2009/0423/LTN20090423238.pdf
finished write
http://www.hkexnews.hk/listedco/listconews/SEHK/2008/0425/LTN20080425250.pdf
finished write
http://www.hkexnews.hk/listedco/listconews/SEHK/2015/0319/LTN20150319329.pdf
finished write
http://www.hkexnews.hk/listedco/listconews/SEHK/2014/0324/LTN20140324959.pdf
finished write
http://www.hkexnews.hk/listedco/listconews/SEHK/2013/0402/LTN201304021122.pdf
finished write
http://www.hkexnews.hk/listedco/listconews/SEHK/2012/0326/LTN20120326263.pdf
finished write
http://www.hkexnews.hk/listedco/listconews/SEHK/2012/0326/LTN20120326253.pdf
finished write
http://www.hkexnews.hk/listedco/listconews/SEHK/2009/0330/LTN20090330188.pdf
finished write
http://www.hkexnews.hk/listedco/listconews/SEHK/2009/0320/LTN20090320083.pdf
finished write
http://www.hkexnews.hk/listedco/listconews/SEHK/2016/0428/LTN201604281016.pdf
finished write
http://www.hkexnews.hk/listedco/listconews/SEHK/2015/0429/LTN20150429233.pdf
finished write
http://www.hkexnews.hk/listedco/listconews/SEHK/2014/0429/LTN20140429945.pdf
finished write
http://www.hkexnews.hk/listedco/listconews/SEHK/2013/0429/LTN201304291031.pdf
finished write
http://www.hkexnews.hk/listedco/listconews/SEHK/2012/0426/LTN20120426229.pdf
finished write
http://www.hkexnews.hk/listedco/listconews/SEHK/2011/0421/LTN20110421266.pdf
finished write
http://www.hkexnews.hk/listedco/listconews/SEHK/2010/0429/LTN20100429830.pdf
finished write
http://www.hkexnews.hk/listedco/listconews/SEHK/2009/0428/LTN200904281430.pdf
finished write
http://www.hkexnews.hk/listedco/listconews/SEHK/2008/0429/LTN20080429728.pdf
finished write
http://www.hkexnews.hk/listedco/listconews/SEHK/2016/0323/LTN20160323343.pdf
finished write
http://www.hkexnews.hk/listedco/listconews/SEHK/2015/0313/LTN20150313356.pdf
finished write
http://www.hkexnews.hk/listedco/listconews/SEHK/2014/0327/LTN20140327637.pdf
finished write
http://www.hkexnews.hk/listedco/listconews/SEHK/2013/0326/LTN20130326368.pdf
finished write
http://www.hkexnews.hk/listedco/listconews/SEHK/2012/0326/LTN20120326620.pdf
finished write
http://www.hkexnews.hk/listedco/listconews/SEHK/2011/0426/LTN20110426261.pdf
finished write
http://www.hkexnews.hk/listedco/listconews/SEHK/2010/0408/LTN20100408709.pdf
finished write
http://www.hkexnews.hk/listedco/listconews/SEHK/2009/0429/LTN20090429932.pdf
finished write
http://www.hkexnews.hk/listedco/listconews/SEHK/2008/0416/LTN20080416269.pdf
finished write
http://www.hkexnews.hk/listedco/listconews/SEHK/2016/0425/LTN20160425745.pdf
finished write
http://www.hkexnews.hk/listedco/listconews/SEHK/2015/0423/LTN20150423635.pdf
finished write
http://www.hkexnews.hk/listedco/listconews/SEHK/2014/0422/LTN20140422239.pdf
finished write
http://www.hkexnews.hk/listedco/listconews/SEHK/2013/0417/LTN20130417330.pdf
finished write
http://www.hkexnews.hk/listedco/listconews/SEHK/2012/0423/LTN20120423313.pdf
finished write
http://www.hkexnews.hk/listedco/listconews/SEHK/2011/0406/LTN20110406041.pdf
finished write
http://www.hkexnews.hk/listedco/listconews/SEHK/2010/0426/LTN20100426737.pdf
finished write
http://www.hkexnews.hk/listedco/listconews/SEHK/2009/0428/LTN20090428495.pdf
finished write
http://www.hkexnews.hk/listedco/listconews/SEHK/2008/0429/LTN20080429825.pdf

Process finished with exit code 0

这不是你所期望的。正如它在文件中写的那样。