我正在从很多网站上抓网,我正在使用selenium和time.sleep(),但这是一种危险的方式,因为有时我的计算机会变得很糟糕,这样我就会丢失数据。
如何将我的代码更改为Wait_For_Element方法以避免丢失信息?
这是我的代码:
from bs4 import BeautifulSoup
from selenium import webdriver
import time
import urllib2
import unicodecsv as csv
import os
import sys
import io
import time
import datetime
import pandas as pd
from bs4 import BeautifulSoup
import re
import contextlib
import selenium.webdriver.support.ui as ui
import numpy as np
from datetime import datetime, timedelta
import sys
reload(sys)
sys.setdefaultencoding('utf-8')
def scrape(urls):
browser = webdriver.Firefox()
datatable=[]
for url in urls:
browser.get(url)
html = browser.page_source
soup=BeautifulSoup(html,"html.parser")
table = soup.find('table', { "class" : "table table-condensed table-hover data-table m-n-t-15" })
soup2=BeautifulSoup(html,"html.parser")
name = soup2.h2.string
soup3=BeautifulSoup(html,"html.parser")
name2 = soup3.h1.string
soup4=BeautifulSoup(html,"html.parser")
name3 = soup4.h3.string
soup5=BeautifulSoup(html,"html.parser")
name4 = soup5.find('span' , attrs={'class' : 'clock-time ng-binding'}).text.strip()
for record in table.find_all('tr', class_="hidden-xs hidden-sm ng-scope"):
temp_data = []
temp_data.append(name4)
temp_data.append(name)
temp_data.append(name2)
temp_data.append(name3)
for data in record.find_all("td"):
temp_data.append(data.text.encode('latin-1'))
newlist = filter(None, temp_data)
datatable.append(newlist)
time.sleep(10)
browser.close()
return datatable
答案 0 :(得分:0)
正如评论中提到的,您可以使用ExplicitWait
获取动态元素,如下所示:
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
table = WebDriverWait(browser, 10).until(EC.presence_of_element_located((By.CSS_SELECTOR, "table.table.table-condensed.table-hover.data-table.m-n-t-15")))
答案 1 :(得分:0)
您可以使用可重复使用的方法创建一个小型库,并使用Selenium的ExpectedConditions。
public void clickWebElementVisible(String element, By by) throws ObjectMissing {
try {
Utilities.waitExplicit(1);
WebDriverWait wait = new WebDriverWait(this.driver, 30);
WebElement x = wait.until(ExpectedConditions.visibilityOfElementLocated(by));
if (x.isDisplayed()) {
x.click();
} else {
throw new ObjectMissing(" Error in " + getClass() + "." + element + ". Object Missing");
}
} catch (WebDriverException x) {
throw new ObjectMissing(" Error in " + x.getMessage());
}
}
此元素将有助于调试目的,以了解您要查找的元素,并且是定位器值。
用法如下
clickWebElementVisible("lnkLoginUsername", "Locator");
您可以使用Exception或自定义异常“ObjectMissing”