运行异步进程以使用多处理和池更新MySQL

时间:2019-02-16 19:14:08

标签: python mysql multiprocessing

我有一个约有50个条目的MySQL数据库表。我想利用多重处理为每个条目运行异步进程。正在运行的进程数将取决于核心数(默认值)。 下面的代码是我正在尝试做的简化版本。

在dothis()中,使用该数据库条目中的信息运行的代码更多。进程不能并发,因为每个条目的处理时间都非常不同(网络抓取),因此每当完成时,我都需要一个新的进程来启动。

我的问题是: 1.如何确保使用Pool进行异步处理? 2.将数据库游标和数据库表ID传递给进程的最佳方法是什么?

谢谢

from multiprocessing import Pool
import pymysql

def dothis(x,cursor):
    cursor.execute("SELECT * FROM historic WHERE his_id = %s;",(x))
    entry = cursor.fetchone()
    # Here is a bunch of code based on table entry
    print("{}: finished".format(x))

if __name__ == "__main__":

    db = pymysql.connect("localhost","root","password","DBname")
    cursor = db.cursor()
    cursor.execute("SELECT his_id FROM historic;")
    ids = cursor.fetchall()

    # len(ids) ~ 50

    sendargs = []
    for id in ids:
        sendargs.append(id[0])

    with Pool(4) as p:
        # Need to send one of the items in sendargs + cursor
        # to each process.
        # Similar to p.map(dothis,(range(10)) which would start 
        # 10 processes.  Just need len(sendargs) processes, async.

        p.map(dothis,args=(sendargs,cursor))

    db.close()

编辑:因此,我已经编辑了上面的文本(如下所示)以包括map_async和while循环。下一个任务将是在一个键盘中断中进行工作,该中断将完全关闭循环(db.close等),除了键盘中断外,finally可能会起作用,但“可能”会给自己带来比应有的荣誉。

from multiprocessing import Pool, TimeoutError
import pymysql

def dothis(x):
    id = x[0]
    cursor = x[1]
    cursor.execute("SELECT * FROM historic WHERE his_id = %s;",(id))
    entry = cursor.fetchone()
    # Here is a bunch of code based on table entry
    print("{}: finished".format(id))

if __name__ == "__main__":

    db = pymysql.connect("localhost","root","password","DBname")
    cursor = db.cursor()
    cursor.execute("SELECT his_id FROM historic;")
    ids = cursor.fetchall()

    # len(ids) ~ 50

    sendargs = []
    for id in ids:
        sendargs.append((id[0],cursor))

    pool = Pool(processes = 4) # or however many wanted
    while 1:        
        res = pool.map_async(dothis,sendargs)
        try:
            print(res.get(timeout=25))
        except TimeoutError:
            print("Timeout Error")

    db.close()

0 个答案:

没有答案