我自动打开网站时如何避免连接错误?

时间:2017-10-05 10:28:53

标签: python selenium web-scraping debian python-requests

我使用Python 2.7在Debian上删除了几个网站,但也许我的代码会自动停止(如果它无法及时加载(冻结)或没有互联网连接)。

是否有任何解决方案可以解决此问题,也许只是跳过问题并将我的代码运行到下一个URL?因为如果我遇到这样的问题,这个脚本就会自动停止..

这是我的代码:

#!/usr/bin/python
#!/bin/sh
# -*- coding: utf-8 -*-
from bs4 import BeautifulSoup
from selenium import webdriver
import urllib2
import subprocess
import unicodecsv as csv
import os
import sys
import io
import time
import datetime
import pandas as pd
import MySQLdb
import re
import contextlib
import selenium.webdriver.support.ui as ui
import numpy as np
from datetime import datetime, timedelta
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By 
import pyautogui 
from pykeyboard import PyKeyboard

reload(sys)
sys.setdefaultencoding('utf-8')


cols = ['MYCOLS..'] 

browser = webdriver.Firefox()
datatable=[]

browser.get('LINK1')
time.sleep(5)

browser.find_element_by_xpath('//button[contains(text(), "CLICK EVENT")]').click()
time.sleep(5)
browser.find_element_by_xpath('//button[contains(text(), "CLICK EVENT")]').click()
html = browser.page_source
soup=BeautifulSoup(html,"html.parser")
table = soup.find('table', { "class" : "table table-condensed table-hover data-table m-n-t-15" })    

for record in table.find_all('tr', class_="hidden-xs hidden-sm ng-scope"):   
    for data in record.find_all("td"):
        temp_data.append(data.text.encode('utf-8'))
    newlist = filter(None, temp_data)
    datatable.append(newlist)

time.sleep(10) 
browser.close()

#HERE I INSERT MY DATAES INTO MYSQL..IT IS NOT IMPORTANT, AND MY SECOND LINK STARTING HERE

browser = webdriver.Firefox()
datatable=[]

browser.get('LINK2')
browser.find_element_by_xpath('//button[contains(text(), "LCLICK EVENT")]').click()
time.sleep(5)
html = browser.page_source
soup=BeautifulSoup(html,"html.parser")
table = soup.find('table', { "class" : "table table-condensed table-hover data-table m-n-t-15" })

for record in table.find_all('tr', class_="hidden-xs hidden-sm ng-scope"):   
    for data in record.find_all("td"):
        temp_data.append(data.text.encode('utf-8'))
    newlist = filter(None, temp_data)
    datatable.append(newlist)

time.sleep(10) 
browser.close()

#MYSQLDB PART AGAIN...AND THE NEXT LINK IS COMING.

+1编辑:

当剧本找不到此CLICK EVENT时也会停止。为什么?我怎么能避免这个?

1 个答案:

答案 0 :(得分:0)

使用Selenium,您可以配置驱动程序(浏览器对象)以等待特定元素或条件。然后,您可以使用常规的try / except来处理任何错误,例如TimeoutException或许多其他错误。

Selenium在their documentation上很好地解释了等待系统。

以下是Selenium上的异常处理代码段:

from selenium.webdriver.support.ui import WebDriverWait
from selenium.common.exceptions import TimeoutException
from selenium.common.exceptions import NoSuchElementException
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC

try:
    # Wait for any element / condition, you can even had lambda if you wish to
    WebDriverWait(browser, 10).until(
        EC.visibility_of_all_elements_located((By.ID, 'my-item'))
    )
except TimeoutException:
    # Here I raise an error but you can do whatever you want like exiting properly or logging something
    raise RuntimeError('No Internet connection')