如何在Python中以正确的模式将selenium time.sleep()更改为WaitForElement?

时间:2017-08-08 10:09:10

标签: python selenium web-scraping

我正在从很多网站上抓网,我正在使用selenium和time.sleep(),但这是一种危险的方式,因为有时我的计算机会变得很糟糕,这样我就会丢失数据。

如何将我的代码更改为Wait_For_Element方法以避免丢失信息?

这是我的代码:

from bs4 import BeautifulSoup
from selenium import webdriver
import time
import urllib2
import unicodecsv as csv
import os
import sys
import io
import time
import datetime
import pandas as pd
from bs4 import BeautifulSoup
import re
import contextlib
import selenium.webdriver.support.ui as ui
import numpy as np
from datetime import datetime, timedelta
import sys

reload(sys)
sys.setdefaultencoding('utf-8')

def scrape(urls):
    browser = webdriver.Firefox()
    datatable=[]
    for url in urls:
        browser.get(url)
        html = browser.page_source
        soup=BeautifulSoup(html,"html.parser")
        table = soup.find('table', { "class" : "table table-condensed table-hover data-table m-n-t-15" })
        soup2=BeautifulSoup(html,"html.parser")
        name = soup2.h2.string
        soup3=BeautifulSoup(html,"html.parser")
        name2 = soup3.h1.string
        soup4=BeautifulSoup(html,"html.parser")
        name3 = soup4.h3.string
        soup5=BeautifulSoup(html,"html.parser")
        name4 = soup5.find('span' , attrs={'class' : 'clock-time ng-binding'}).text.strip()

        for record in table.find_all('tr', class_="hidden-xs hidden-sm ng-scope"):
            temp_data = []
            temp_data.append(name4)
            temp_data.append(name)
            temp_data.append(name2)    
            temp_data.append(name3)    
            for data in record.find_all("td"):
                temp_data.append(data.text.encode('latin-1'))
            newlist = filter(None, temp_data)
            datatable.append(newlist)

    time.sleep(10) 
    browser.close()
    return datatable

2 个答案:

答案 0 :(得分:0)

正如评论中提到的,您可以使用ExplicitWait获取动态元素,如下所示:

from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

table = WebDriverWait(browser, 10).until(EC.presence_of_element_located((By.CSS_SELECTOR, "table.table.table-condensed.table-hover.data-table.m-n-t-15")))

答案 1 :(得分:0)

您可以使用可重复使用的方法创建一个小型库,并使用Selenium的ExpectedConditions。

       public void clickWebElementVisible(String element, By by) throws ObjectMissing {

    try {
        Utilities.waitExplicit(1);
        WebDriverWait wait = new WebDriverWait(this.driver, 30);
        WebElement x = wait.until(ExpectedConditions.visibilityOfElementLocated(by));
        if (x.isDisplayed()) {
          x.click();
        } else {
            throw new ObjectMissing(" Error in " + getClass() + "." + element + ". Object Missing");
        }
    } catch (WebDriverException x) {
        throw new ObjectMissing(" Error in " + x.getMessage());
    }
}

此元素将有助于调试目的,以了解您要查找的元素,并且是定位器值。

用法如下

clickWebElementVisible("lnkLoginUsername", "Locator");

您可以使用Exception或自定义异常“ObjectMissing”