Question

我遇到多处理的情况，我用来从我的函数收集结果的列表没有被进程更新。我有两个代码示例，一个更新列表更正：使用'Thread'正确更新代码，但在使用'Process'时失败，而没有。我无法检测到任何类型的错误。我认为这可能是我不明白的范围的微妙之处。

以下是工作示例：更正：此示例也不起作用;但是，与threading.Thread一起使用。

def run_knn_result_wrapper(dataset,k_value,metric,results_list,index):
    results_list[index] = knn_result(dataset,k_value,metric)

results = [None] * (k_upper-k_lower)
threads = [None] * (k_upper-k_lower)
joined = [0] * (k_upper-k_lower)

for i in range(len(threads)):
    threads[i] = Process(target=run_knn_result_wrapper,args=(dataset,k_lower+i,metric,results,i))
    threads[i].start()
    if batch_size == 1:
        threads[i].join()
        joined[i]=1
    else:

        if i % batch_size == batch_size-1 and i > 0:
            for j in range(max(0,i - 2),i):
                if joined[j] == 0:
                    threads[j].join()
                    joined[j] = 1
for i in range(len(threads)):
    if joined[i] == 0:
        threads[i].join()


Ignoring the "threads" variable name (this started on threading, but then I found out about the GIL), the `results` list updates perfectly.

以下是不更新结果列表的代码：

def prediction_on_batch_wrapper(batchX,results_list,index):
        results_list[index] = prediction_on_batch(batchX)



batches_of_X = np.array_split(X,10)

overall_predicted_classes_list = []
for i in range(len(batches_of_X)):
    batches_of_X_subsets = np.array_split(batches_of_X[i],10)
    processes = [None]*len(batches_of_X_subsets)
    results_list = [None]*len(batches_of_X_subsets)
    for j in range(len(batches_of_X_subsets)):
        processes[j] = Process(target=prediction_on_batch_wrapper,args=(batches_of_X_subsets[j],results_list,j))
    for j in processes:
        j.start()
    for j in processes:
        j.join()
    if len(results_list) > 1:
        results_array = np.concatenate(tuple(results_list))
    else:
        results_array = results_list[0]

我不知道为什么，在Python的范围规则中，results_list列表不会被prediction_on_batch_wrapper函数更新。

调试会话显示results_list函数中的prediction_on_batch_wrapper值实际上已更新...但不知何故，它的范围在第二个python文件上是本地的，并且是全局的第一...

这里发生了什么？

Answer 1

这是因为您正在产生另一个进程 - 单独的进程不共享任何资源，包括内存。

每个进程都是一个单独的独立运行程序，通常在任务管理器或ps中可见。当您使用Process启动其他进程时，您应该在生成进程时看到第二个Python启动实例。

线程是主进程中的另一个执行点，即使跨多个内核也共享主进程的所有资源。进程中的所有线程都能够看到整个进程的任何部分，尽管它们可以使用多少取决于您为该线程编写的代码以及您编写它们的语言的限制。

使用Process就像运行程序的两个实例一样;您可以将参数传递给新进程，但这些参数在传递后不再共享。例如，如果您修改了主进程中的数据，则新进程将看不到更改，因为这两个进程具有完全独立的数据副本。

如果要共享数据，则应该使用线程而不是进程。对于大多数多处理需求，线程优于进程，除非在少数需要严格分离的情况下。

多处理范围：列表不使用'multiprocessing.Process'更新，使用'threading.Thread'工作

1 个答案: