Question

我有一个庞大的链接数据库，我必须ping一下它们是否仍然有效，我正在使用ProcessPoolExecutor来检查每秒的最大值。

我正在尝试使用python脚本检查每个网页，并在检查后更新MySQL数据库。我的问题来自MySQL更新，当我检查没有MySQL更新的脚本的输出时，一切工作正常，但是当我尝试在上次检查数据库的情况下更新数据库时，脚本将在第一次更新期间阻塞。我每100个链接工作一次，因为如果尝试一次全部获取它们，我将用光内存。这是我的脚本：

import concurrent.futures
import time
import mysql.connector
from urllib.request import *
import urllib.request
from bs4 import BeautifulSoup

MainDB = mysql.connector.connect(
  host="**************",
  user="********",
  passwd="***********",
  database="**********"
)
Maincursor = MainDB.cursor(buffered=True)

def Dead(Link):
    Date = time.strftime('%Y-%m-%d %H:%M:%S')
    print(Date,':',Link,'is Dead')
    try:
        SQLInsert = "UPDATE ******** SET Alive=%s, LastTimeSeen=%s WHERE Link = %s"
        DataInsert= (0,Date,Link)
        Maincursor.execute(SQLInsert,DataInsert)
        MainDB.commit()
    except mysql.connector.Error as err:
        print("Error Updating Dead : '%s'",format(err))
        MainDB.rollback()

def Alive(Link):
    Date = time.strftime('%Y-%m-%d %H:%M:%S')
    try:
        SQLInsert = "UPDATE ******** SET Alive=%s, LastTimeSeen=%s WHERE Link = %s"
        DataInsert= (1,Date,Link)
        Maincursor.execute(SQLInsert,DataInsert)
        MainDB.commit()
    except mysql.connector.Error as err:
        print("Error Updating Alive : '%s'",format(err))
        MainDB.rollback()

def load_link(Link):
    try:
        html_offer = urlopen(Link)
    except urllib.error.HTTPError as err :
        return 0
    except urllib.error.ContentTooShortError as err :
        return 0
    except urllib.error.URLError as err :
        return 0
    else:
        return 1

while(1):
    SQL = "SELECT COUNT(ID) FROM *****"
    try:
       Maincursor.execute(SQL)
       MainDB.commit()
    except mysql.connector.Error as err:
        print("Error Getting Count : '%s'",format(err))
    Found = Maincursor.fetchone()
    if Found[0] > 0:
        for x in range(0,Found[0],100):
            SQL = "SELECT Link FROM ***** LIMIT %s,100"
            try:
               Maincursor.execute(SQL,(x,))
               MainDB.commit()
            except mysql.connector.Error as err:
                print("Error Selecting 100 Rows : '%s'",format(err))
            Found = Maincursor.rowcount
            if Found > 0:
                Identified = Maincursor.fetchall()
                x = []
                for item in Identified:
                    x.extend(item)
                with concurrent.futures.ProcessPoolExecutor() as executor:
                    for Link, alive in zip(x, executor.map(load_link, x)):
                        if alive==1:
                            Alive(url)
                        else:
                            Dead(url)

我尝试使用具有不同多重处理功能的MySQL Pool，但是每次MySQL数据库都无法支持大量查询时。我还尝试了使用新的MySQL连接实现Dead / Alive功能，但存在相同的问题。我想知道为什么我的脚本在第一次更新后停止？

Python：Pinkg与进程池执行程序和MySQL的链接

0 个答案: