Question

我开始在with s as ( select job_unid, conttype, count(*) as qty, row_number() over (partition by job_unid order by conttype) as rn from sea group by job_unid, conttype ) select j1.lotno, j1.shipno, j1.etd, j1.eta, s1.conttype as conttype1, s1.qty as conttype1_qty, s2.conttype as conttype2, s2.qty as conttype2_qty, s3.conttype as conttype3, s3.qty as conttype3_qty from job j1 join job j2 on j2.shipno = j1.lotno left join s s1 on s1.job_unid = j2.unid and s1.rn = 1 left join s s2 on s2.job_unid = j2.unid and s2.rn = 2 left join s s3 on s3.job_unid = j2.unid and s3.rn = 3 order by j1.lotno, j1.shipno;中了解multiprocessing，我注意到在主进程上执行的代码比使用python模块创建的进程快得多。

以下是我的代码的简化示例，其中我首先在multiprocessing上执行代码并打印前10次计算的时间和总计算的时间。并且在main process上执行相同的代码（这是我可以随时发送new process的长期运行过程。）

new_pattern

这是我的结果：

import multiprocessing
import random
import time


old_patterns = [[random.uniform(-1, 1) for _ in range(0, 10)] for _ in range(0, 2000)]
new_patterns = [[random.uniform(-1, 1) for _ in range(0, 10)] for _ in range(0, 100)]


new_pattern_for_processing = multiprocessing.Array('d', 10)
there_is_new_pattern = multiprocessing.Value('i', 0)
queue = multiprocessing.Queue()


def iterate_and_add(old_patterns, new_pattern):
    for each_pattern in old_patterns:
        sum = 0
        for count in range(0, 10):
            sum += each_pattern[count] + new_pattern[count]


print_count_main_process = 0
def patt_recognition_main_process(new_pattern):
    global print_count_main_process
    # START of same code on main process
    start_main_process_one_patt = time.time()
    iterate_and_add(old_patterns, new_pattern)
    if print_count_main_process < 10:
        print_count_main_process += 1
        print("Time on main process one pattern:", time.time() - start_main_process_one_patt)
    # END of same code on main process


def patt_recognition_new_process(old_patterns, new_pattern_on_new_proc, there_is_new_pattern, queue):
    print_count = 0
    while True:
        if there_is_new_pattern.value:
            #START of same code on new process
            start_new_process_one_patt = time.time()
            iterate_and_add(old_patterns, new_pattern_on_new_proc)
            if print_count < 10:
                print_count += 1
                print("Time on new process one pattern:", time.time() - start_new_process_one_patt)
            #END of same code on new process
            queue.put("DONE")
            there_is_new_pattern.value = 0


if __name__ == "__main__":
    start_main_process = time.time()
    for new_pattern in new_patterns:
        patt_recognition_main_process(new_pattern)
    print(".\n.\n.")
    print("Total Time on main process:", time.time() - start_main_process)

    print("\n###########################################################################\n")

    start_new_process = time.time()
    p1 = multiprocessing.Process(target=patt_recognition_new_process, args=(old_patterns, new_pattern_for_processing, there_is_new_pattern, queue))
    p1.start()
    for new_pattern in new_patterns:
        for idx, n in enumerate(new_pattern):
            new_pattern_for_processing[idx] = n
        there_is_new_pattern.value = 1
        while True:
            msg = queue.get()
            if msg == "DONE":
                break
    print(".\n.\n.")
    print("Total Time on new process:", time.time()-start_new_process)

为什么执行时间有这么大的差异？

Answer 1

它有点微妙，但问题在于

new_pattern_for_processing = multiprocessing.Array('d', 10)

它不包含python float对象，它保存原始字节，在这种情况下足以容纳10个8字节机器级double。当您读取或写入此数组时，python必须将float转换为double或反过来。如果您正在阅读或写一次，这不是什么大问题，但是您的代码在循环中多次执行并且这些转换占主导地位。

为了确认，我将机器级别数组复制到python浮点列表中，并使该过程正常工作。现在它的速度与父母相同。我的更改仅在一个函数中

def patt_recognition_new_process(old_patterns, new_pattern_on_new_proc, there_is_new_pattern, queue):
    print_count = 0
    while True:
        if there_is_new_pattern.value:
            local_pattern = new_pattern_on_new_proc[:]
            #START of same code on new process
            start_new_process_one_patt = time.time()
            #iterate_and_add(old_patterns, new_pattern_on_new_proc)
            iterate_and_add(old_patterns, local_pattern)
            if print_count < 10:
                print_count += 1
                print("Time on new process one pattern:", time.time() - start_new_process_one_patt)
            #END of same code on new process
            there_is_new_pattern.value = 0
            queue.put("DONE")

Answer 2

在这种特殊情况下，您似乎在另一个进程中执行顺序执行，而不是并行化算法。这会产生一些开销。

创建流程本身需要时间。但这并不是全部。您还在队列中传输数据并使用Manager代理。这些都是实际的队列或实际上是两个队列和另一个进程。与使用内存中的数据副本相比，队列非常非常慢。

如果你接受你的代码，在另一个进程中执行它并使用队列来传入和传出数据，它总是更慢。从性能的角度来看，这使得它毫无意义。尽管如此，可能还有其他原因，例如，如果您的主程序需要执行其他操作，例如等待IO。

如果您希望提高性能，则应该创建多个进程并分割算法，以便在不同的进程中处理范围的某些部分，从而并行工作。如果您希望让一组工作进程准备好等待更多工作，您也可以考虑ProcessPoolExecutor。这将减少进程创建开销，因为您只执行一次。在Python 3中，您还可以使用/usr/local/lib/node_modules/rxjs-tslint/bin/rxjs-5-to-6-migrate -p src/tsconfig.app.json。

并行处理很有用，但很少用蛇油可以轻松解决所有问题。为了充分利用它，您需要重新设计程序以最大化并行处理并最大限度地减少队列中的数据传输。

在新进程中执行python代码比在主进程上慢得多

2 个答案: