Question

我正在将数据放入pandas.Dataframe的列表中。然后，我需要将此数据发送到数据库。每次迭代大约2分钟。

所以我想知道并行处理是一个好主意吗？（没有锁定问题或什么？）

所以我想从以下地方实现它：

for df in df_list:
    # Send a DF's batch to the DB
    print('Sending DF\'s data to DB')
    df.to_sql('ga_canaux', engine, if_exists='append', index=False)
    db.check_data()

关于多处理的一些知识：

with multiprocessing.Pool(processes=4) as pool:
    results = pool.map_async(df.to_sql(???), df_list)
    results.wait()

如何通过df.to_sql传递map_async中需要的参数？

编辑：

我尝试传递N个参数，例如：

pool = multiprocessing.Pool()
    args = ((df, engine, db) for df in df_list)
    results = pool.map(multiproc, args)
    results.wait()

但出现错误TypeError: can't pickle _thread._local objects

EDIT2：

我对mp的处理方式进行了一些更改，这有点工作（使用相同的示例数据集的179s与732s）。但是，当我尝试从池中的数据库读取数据时，我遇到了一个错误。

# Connect to the remote DB
global DB
DB = Database()
global ENGINE
ENGINE = DB.connect()

pool = mp.Pool(mp.cpu_count() - 1)
pool.map(multiproc, df_list)
pool.close()
pool.join()

def multiproc(df):
    print('Sending DF\'s data to DB')
    df.to_sql('ga_canaux', ENGINE, if_exists='append', index=False)
    DB.check_data() // HERE

错误：

(psycopg2.OperationalError) SSL SYSCALL error: EOF detected
 [SQL: 'SELECT COUNT(*) FROM ga_canaux'] (Background on this error at: http://sqlalche.me/e/e3q8)

编辑3 当我尝试更大的样本时，数据库超时：psycopg2.DatabaseError: SSL SYSCALL error: Operation timed out

Python-pool.map_async将参数传递给函数

0 个答案: