Question

我希望python根据表列表并行处理查询，

tables = ['TBL_A','TBL_B'
    'TBL_C','TBL_D','TBL_E',
    'TBL_F','TBL_G','TBL_H'
]

但是显然，它是串行运行的，一个表到另一个表。这意味着ETA为16分钟，因为最大的查询运行1个15分钟左右。仅供参考，我有 cpu_count = 16

import multiprocessing as mp
import etl_greener_01 as ego
tables = ['TBL_A','TBL_B'
    'TBL_C','TBL_D','TBL_E',
    'TBL_F','TBL_G','TBL_H'
]

lock = mp.Lock()

def task(table_nm):
    global lock

    lock.acquire()
    try:
        #print(f"Result: {table_nm.upper()}")
        print(f"Task Executed with process {mp.current_process().pid}")
        ego.extract_tbl(table_nm, ego.columns2carry(table_nm.upper(),p_kv_sy),p_kv_sy)
    finally:
        lock.release()    

def main():
  executor = mp.Pool(mp.cpu_count()-1)
  executor.map(task, tables)
  executor.close()

if __name__ == "__main__":
  main()

必须与多进程一起运行

Running 'TBL_A',
Running 'TBL_B'
Running 'TBL_C'
Running 'TBL_D'
Running 'TBL_E'
Running 'TBL_F'
Running 'TBL_G'
Running 'TBL_H'

为什么多处理不能并行处理查询？

0 个答案: