我有一个庞大的链接数据库,我必须ping一下它们是否仍然有效,我正在使用ProcessPoolExecutor来检查每秒的最大值。
我正在尝试使用python脚本检查每个网页,并在检查后更新MySQL数据库。我的问题来自MySQL更新,当我检查没有MySQL更新的脚本的输出时,一切工作正常,但是当我尝试在上次检查数据库的情况下更新数据库时,脚本将在第一次更新期间阻塞。我每100个链接工作一次,因为如果尝试一次全部获取它们,我将用光内存。这是我的脚本:
import concurrent.futures
import time
import mysql.connector
from urllib.request import *
import urllib.request
from bs4 import BeautifulSoup
MainDB = mysql.connector.connect(
host="**************",
user="********",
passwd="***********",
database="**********"
)
Maincursor = MainDB.cursor(buffered=True)
def Dead(Link):
Date = time.strftime('%Y-%m-%d %H:%M:%S')
print(Date,':',Link,'is Dead')
try:
SQLInsert = "UPDATE ******** SET Alive=%s, LastTimeSeen=%s WHERE Link = %s"
DataInsert= (0,Date,Link)
Maincursor.execute(SQLInsert,DataInsert)
MainDB.commit()
except mysql.connector.Error as err:
print("Error Updating Dead : '%s'",format(err))
MainDB.rollback()
def Alive(Link):
Date = time.strftime('%Y-%m-%d %H:%M:%S')
try:
SQLInsert = "UPDATE ******** SET Alive=%s, LastTimeSeen=%s WHERE Link = %s"
DataInsert= (1,Date,Link)
Maincursor.execute(SQLInsert,DataInsert)
MainDB.commit()
except mysql.connector.Error as err:
print("Error Updating Alive : '%s'",format(err))
MainDB.rollback()
def load_link(Link):
try:
html_offer = urlopen(Link)
except urllib.error.HTTPError as err :
return 0
except urllib.error.ContentTooShortError as err :
return 0
except urllib.error.URLError as err :
return 0
else:
return 1
while(1):
SQL = "SELECT COUNT(ID) FROM *****"
try:
Maincursor.execute(SQL)
MainDB.commit()
except mysql.connector.Error as err:
print("Error Getting Count : '%s'",format(err))
Found = Maincursor.fetchone()
if Found[0] > 0:
for x in range(0,Found[0],100):
SQL = "SELECT Link FROM ***** LIMIT %s,100"
try:
Maincursor.execute(SQL,(x,))
MainDB.commit()
except mysql.connector.Error as err:
print("Error Selecting 100 Rows : '%s'",format(err))
Found = Maincursor.rowcount
if Found > 0:
Identified = Maincursor.fetchall()
x = []
for item in Identified:
x.extend(item)
with concurrent.futures.ProcessPoolExecutor() as executor:
for Link, alive in zip(x, executor.map(load_link, x)):
if alive==1:
Alive(url)
else:
Dead(url)
我尝试使用具有不同多重处理功能的MySQL Pool,但是每次MySQL数据库都无法支持大量查询时。我还尝试了使用新的MySQL连接实现Dead / Alive功能,但存在相同的问题。我想知道为什么我的脚本在第一次更新后停止?