Question

如何并行化以下代码，属性列中的元素数量接近15，因此组合需要更多时间。

combs = set()
for L in range(0,len(attributes)+1):
    combs.add(itertools.combinations(attributes,L))

任何使用多处理并行化的方法吗？

我尝试了这个，但我收到了这个错误。 - 如果chunksize＆lt; = 0：

TypeError：unorderable类型：range（）＆lt; = int（）

import itertools     
from multiprocessing import Pool
def comb(attributes):
    res = itertools.combinations(attributes)
    return res

def main():
    p = Pool(4)
    times = range(0,len(attributes)+1)                                                
    values = p.map(comb,attributes,times)
    p.close()
    p.join()
    print(values)

if __name__ == '__main__':
    attributes =('Age', 'Workclass', 'Fnlwgt', 'Education', 'Education-num', 'marital-status', 'Occupation', 'Relationship', 'Race', 'Sex', 'Capital-gain', 'Capital-loss', 'Hours-per-week', 'Native country', 'Probability', 'Id')
    main()

因为有人要求解释这个问题，所以你去...我试图获得无需替换的组合。基本上是n！。例如，如果我的属性变量中有A，B，C，我试图得到（A），（B），（C），（A，B），（A，C），（A，B， C）。由于属性中的元素数量不是静态的，并且它根据输入数据集而变化，因此我无法对其进行硬编码。所以，我在这里使用len（属性），其中属性将存储数据集中的属性。然后创建组合，itertools.combinations（属性，L）通常会创建长度为L的所有组合。在我的例子中，如果我给出长度（属性），那么我将只获得ABC，而不是其他组合。所以我创建了一个长度范围，并添加了一个用于处理第0个元素。

现在回到这个问题，我可能会在我的数据集中得到15个元素，因此长度（属性）将是15，即15！这种组合生成需要花费大量时间，因为它必须执行此阶乘。所以我想以这样的方式并行化，即每个处理器一次处理一个组合集生成，例如一个处理器将生成长度为2和长度为3的其他组合等等...但是在池映射中我我无法正确传递多个参数。希望这能清除情况，如果需要进一步解释，请告诉我。

Answer 1

您的多处理代码存在一些问题，这意味着它不会像您的单进程版本一样工作。

首先，您没有正确调用p.map。 map方法的参数是要调用的函数，参数（单个序列）和块大小，指定一次传递给worker的值。您将range对象作为chunksize传递，这是导致错误的直接原因。

如果你试图解决这个问题，你会发现其他问题。例如，您将attributes传递给map的方式是，它只会将一个值传递给每个工作进程，而不是整个属性列表。并且你的comb函数会在组合值上返回一个迭代器，而不是值本身（因此工作人员会立即或多或少地完成，但会返回一些无法打印出来的东西）。

以下是我认为的工作代码：

import itertools     
from multiprocessing import Pool

# attributes is always accessible as a global, so worker processes can directly access it
attributes = ('Age', 'Workclass', 'Fnlwgt', 'Education', 'Education-num',
              'marital-status', 'Occupation', 'Relationship', 'Race', 'Sex',
              'Capital-gain', 'Capital-loss', 'Hours-per-week', 'Native country',
              'Probability', 'Id')

def comb(n): # the argument n is the number of items to select
    res = list(itertools.combinations(attributes, n)) # create a list from the iterator
    return res

def main():
    p = Pool(4)
    times = range(0, len(attributes)+1)                                                
    values = p.map(comb, times) # pass the range as the sequence of arguments!
    p.close()
    p.join()
    print(values)

if __name__ == '__main__':
    main()

如果属性列表很大，此代码仍需要一段时间才能完成，但这仅仅是因为要打印出大量值（具有n值的集合的powerset有2^n个子集。我的IDE告诉我输出超过88000行（幸好没有显示它们全部）。如果问题的多处理部分不是输出部分的问题，那就不会感到惊讶了！

并行化组合python

TypeError：unorderable类型：range（）＆lt; = int（）

1 个答案: