Question

我对python相对较新，并且已经能够根据表单上回答的类似问题回答我的大多数问题，但是我已经陷入困境，我可以使用一些帮助。

我有一个简单的嵌套for循环脚本，它生成一个字符串输出。我接下来需要做的是让每个分组都经过模拟，基于数字值来匹配字符串。

我的问题是如何以最好的方式解决这个问题？我不确定多线程是否会起作用，因为生成字符串然后需要进行模拟，一次一组。我正在阅读有关队列的信息，并且不确定是否可以将它们传入队列进行存储，然后按照他们进入队列的顺序进行模拟。

无论我做过什么研究，我都会接受任何人可以就此事提出的任何建议。

谢谢！

编辑：我不是在寻找如何进行模拟的答案，而是在计算模拟时存储组合的方法

例如

X = ["a","b"]
Y = ["c","d","e"]
Z= ["f","g"]

for A in itertools.combinations(X,1):
    for B in itertools.combinations(Y,2):
        for C in itertools.combinations(Z, 2):

        D = A + B + C
        print(D)

Answer 1

正如评论中暗示的那样，multiprocessing模块是您正在寻找的模块。由于全局解释器锁（GIL），线程无法帮助您，它一次将执行限制为一个Python线程。特别是，我会看multiprocessing pools。这些对象为您提供了一个接口，让一个子进程池与主进程并行工作，您可以返回并稍后检查结果。

您的示例代码段可能如下所示：

import multiprocessing

X = ["a","b"]
Y = ["c","d","e"]
Z= ["f","g"]

pool = multiprocessing.pool() # by default, this will create a number of workers equal to
                 # the number of CPU cores you have available
combination_list = [] # create a list to store the combinations

for A in itertools.combinations(X,1):
    for B in itertools.combinations(Y,2):
        for C in itertools.combinations(Z, 2):

        D = A + B + C
        combination_list.append(D) # append this combination to the list

results = pool.map(simulation_function, combination_list)
# simulation_function is the function you're using to actually run your
# simulation - assuming it only takes one parameter: the combination

对pool.map的调用是阻塞的 - 意味着一旦你调用它，主进程中的执行将停止，直到所有模拟完成，但它并行运行它们。完成后，无论您的模拟函数返回什么，都将在results中以与combination_list中输入参数相同的顺序提供。

如果您不想等待它们，您还可以在池中使用apply_async并存储结果以便稍后查看：

import multiprocessing

X = ["a","b"]
Y = ["c","d","e"]
Z= ["f","g"]

pool = multiprocessing.pool()
result_list = [] # create a list to store the simulation results

for A in itertools.combinations(X,1):
    for B in itertools.combinations(Y,2):
        for C in itertools.combinations(Z, 2):

        D = A + B + C
        result_list.append(pool.apply_async(
                simulation_function,
                args=(D,))) # note the extra comma - args must be a tuple

# do other stuff
# now iterate over result_list to check the results when they're ready

如果您使用此结构，则result_list将会填充multiprocessing.AsyncResult objects，这样您就可以检查result.ready()是否已准备就绪，如果已准备就绪，则可以检索result.get() pool.map的结果。这种方法将导致模拟在计算组合时立即启动，而不是等到计算完所有模拟以开始处理它们。缺点是管理和检索结果要复杂一些。例如，您必须确保结果已准备好或准备好捕获异常，您需要准备好捕获可能在worker函数中引发的异常等。在文档中可以很好地解释警告。

如果计算组合实际上并不需要很长时间，并且您不会介意主要流程暂停，直到他们准备就绪，我建议采用doc.getChar()方法。

用于大数据处理的python

1 个答案: