我使用python selenium web驱动程序编写网页报废脚本它工作正常但问题是数据太大所以当互联网连接中断一段时间我的脚本将停止执行所以我必须从头开始。 请帮助我或使用scrapy框架指导我一些其他解决方案
工作代码:
import time
from bs4 import BeautifulSoup
from selenium import webdriver
import MySQLdb
# Open database connection
db = MySQLdb.connect(host="localhost", # your host, usually localhost
user="root", # your username
passwd="", # your password
db="test") # name of the data base
url = "http://xlnindia.gov.in/frm_G_Cold_S_Query.aspx"
browser = webdriver.Chrome()
browser.get(url)
time.sleep(5)
html = browser.page_source
soup = BeautifulSoup(html, "lxml")
states = [ x["value"] for x in soup.find("td", bgcolor="#ffe0c0").find_all("option") ]
for state in states:
cur_state = browser.find_element_by_id("ddldistrict")
cur_state.send_keys(state)
html = browser.page_source
soup = BeautifulSoup(html, "lxml")
districts = [ x["value"] for x in soup.find("td", bgcolor="#ffe0c0").find_next_sibling().find_all("option") ]
districts = districts[1:]
for dist in districts:
district = browser.find_element_by_id("ddldistrict")
district.send_keys(dist)
html = browser.page_source
soup = BeautifulSoup(html, "lxml")
talukas = [ x["value"] for x in soup.find("td", bgcolor="#ffe0c0").find_next_sibling().find_all("option") ]
talukas = talukas[1:]
for taluka in talukas:
category = browser.find_element_by_id("ddltaluka")
category.send_keys(taluka)
button = browser.find_element_by_id("btnSearch")
button.click()
time.sleep(10)
browser.save_screenshot(browser.title + ".JPEG")
html = browser.page_source
soup = BeautifulSoup(html, "html.parser")
table = soup.find(id="dgDisplay")
if table:
#print("found")
cursor=db.cursor()
for row in table.findAll("tr")[1:]:
cells = row.findAll("td")
name = cells[0].find(text=True)
city = cells[1].find(text=True)
licence = cells[2].find(text=True)
owner = cells[3].find(text=True)
print("Name:"+name+"city:"+city+"licence:"+licence+"owner:"+owner)
try:
cursor.execute ("INSERT INTO distributors (name, city, licence, owner) VALUES (%s, %s, %s, %s);", (name, city, licence, owner))
except (MySQLdb.Error, MySQLdb.Warning) as e:
print(e)
db.commit()
#else:
# print("not found")
db.close()
browser.close()
browser.quit()
答案 0 :(得分:-2)
通常,如果在找到元素时没有设置隐式超时,则驱动程序会立即抛出异常。因此,如果页面缓慢加载也通过这种方式处理,并且find元素将等待甚至中断。您可以在启动浏览器后添加以下超时并尝试,它可能对您有帮助。
driver.implicitly_wait(30);