我有一个约有50个条目的MySQL数据库表。我想利用多重处理为每个条目运行异步进程。正在运行的进程数将取决于核心数(默认值)。 下面的代码是我正在尝试做的简化版本。
在dothis()中,使用该数据库条目中的信息运行的代码更多。进程不能并发,因为每个条目的处理时间都非常不同(网络抓取),因此每当完成时,我都需要一个新的进程来启动。
我的问题是: 1.如何确保使用Pool进行异步处理? 2.将数据库游标和数据库表ID传递给进程的最佳方法是什么?
谢谢
from multiprocessing import Pool
import pymysql
def dothis(x,cursor):
cursor.execute("SELECT * FROM historic WHERE his_id = %s;",(x))
entry = cursor.fetchone()
# Here is a bunch of code based on table entry
print("{}: finished".format(x))
if __name__ == "__main__":
db = pymysql.connect("localhost","root","password","DBname")
cursor = db.cursor()
cursor.execute("SELECT his_id FROM historic;")
ids = cursor.fetchall()
# len(ids) ~ 50
sendargs = []
for id in ids:
sendargs.append(id[0])
with Pool(4) as p:
# Need to send one of the items in sendargs + cursor
# to each process.
# Similar to p.map(dothis,(range(10)) which would start
# 10 processes. Just need len(sendargs) processes, async.
p.map(dothis,args=(sendargs,cursor))
db.close()
编辑:因此,我已经编辑了上面的文本(如下所示)以包括map_async和while循环。下一个任务将是在一个键盘中断中进行工作,该中断将完全关闭循环(db.close等),除了键盘中断外,finally可能会起作用,但“可能”会给自己带来比应有的荣誉。>
from multiprocessing import Pool, TimeoutError
import pymysql
def dothis(x):
id = x[0]
cursor = x[1]
cursor.execute("SELECT * FROM historic WHERE his_id = %s;",(id))
entry = cursor.fetchone()
# Here is a bunch of code based on table entry
print("{}: finished".format(id))
if __name__ == "__main__":
db = pymysql.connect("localhost","root","password","DBname")
cursor = db.cursor()
cursor.execute("SELECT his_id FROM historic;")
ids = cursor.fetchall()
# len(ids) ~ 50
sendargs = []
for id in ids:
sendargs.append((id[0],cursor))
pool = Pool(processes = 4) # or however many wanted
while 1:
res = pool.map_async(dothis,sendargs)
try:
print(res.get(timeout=25))
except TimeoutError:
print("Timeout Error")
db.close()