Question

item_list = [("a", 10, 20), ("b", 25, 40), ("c", 40, 100), ("d", 45, 90),
             ("e", 35, 65), ("f", 50, 110)] #weight/value
results = [("", 0, 0)]  #an empty string and a 2-tupel to compare with the new
                        #values

class Rucksack(object):
    def __init__(self, B):
        self.B = B   #B=maximum weight
        self.pack(item_list, 0, ("", 0, 0))

    def pack(self, items, n, current):  
        n += 1   #n is incremented, to stop the recursion, if all
        if n >= len(items) - 1:
            if current[2] > results[0][2]:
                #substitutes the result, if current is bigger and starts no
                #new recursion
                results[0] = current
        else:
            for i in items:
                if current[1] + i[1] <= self.B and i[0] not in current[0]:
                    #first condition: current + the new value is not bigger
                    #than B; 2nd condition: the new value is not the same as
                    #current
                    i = (current[0] + " " + i[0], current[1] + i[1],
                         current[2] + i[2])
                    self.pack(items, n, i)
                else:
                    #substitutes the result, if current is bigger and starts no
                    #new recursion
                    if current[2] > results[0][2]:
                        results[0] = current

rucksack1 =背包（100）

这是背包问题的一个小算法。我必须以某种方式并行化代码，但到目前为止我还没有得到线程模块。我认为使用并行化的唯一地方是for循环，对吧？所以，我试过这个：

def run(self, items, i, n, current):
    global num_threads, thread_started
    lock.acquire()
    num_threads += 1
    thread_started = True
    lock.release()
    if current[1] + i[1] <= self.B and i[0] not in current[0]:
        i = (current[0] + " " + i[0], current[1] + i[1], current[2] + i[2])
        self.pack(items, n, i)
    else:
        if current[2] > results[0][2]:
            results[0] = current
    lock.acquire()
    num_threads -= 1
    lock.release()

但结果很奇怪。什么都没发生，如果我做一个键盘中断，结果是正确的，但这绝对不是实现的意义。你能告诉我第二个代码有什么问题吗？或者我可以合理地使用perallelisation。感谢。

Answer 1

首先，由于您的代码是受CPU限制的，因此使用线程进行并行操作将获得很少的好处，因为GIL正如bereal所解释的那样。幸运的是，线程和进程之间只有一些区别 - 基本上，必须明确传递或共享所有共享数据（有关详细信息，请参阅Sharing state between processes）。

其次，如果要对代码进行数据并行化，则必须锁定对可变共享对象的所有访问权限。快速浏览一下，items和current看起来不可变，results对象是您在整个地方修改的共享全局。如果您可以更改代码以返回链中的值，那么这是理想的选择。如果没有，如果你可以积累一堆单独的返回值并在处理完成后合并它们，那通常也是好的。如果两者都不可行，则需要使用锁保护对results的所有访问权限。有关详细信息，请参阅Synchronization between processes。

最后，你问在哪里放平行。关键是在独立任务之间找到正确的分界线。

理想情况下，您希望找到大量可以排队的中型作业，并且只需拥有一个流程池，每个流程都可以获取下一个。从快速浏览一下，显而易见的地方要么是递归调用self.pack，要么是在for i in items:循环的每次迭代。如果它们实际上是独立的，只需使用concurrent.futures，就像在ProcessPollExecutor example中一样。（如果您使用的是Python 3.1或更早版本，则需要futures模块，因为它未内置于stdlib中。）

如果没有简单的方法可以做到这一点，那么通常至少可以创建一个小数字（N或2N，如果你有N个核心）长度大约相同的工作，并且只给每个人一个multiprocessing.Process。例如：

n = 8
procs = [Process(target=rucksack.pack, args=(items[i//n:(i+1)//n],)) for i in range(n)]

最后一点说明：如果你完成了代码并且看起来你已经忘记了隐式共享全局变量，那么你实际所做的就是编写通常但不总是在某些平台上运行的代码，而且从不在别人身上。请参阅multiprocessing文档的Windows部分，了解要避免的内容，并在可能的情况下定期在Windows上进行测试，因为这是最具限制性的平台。

你还问了第二个问题：

你能告诉我第二段代码有什么问题。

你在这里尝试做什么并不完全清楚，但有一些明显的问题（除了上面提到的）。

您不会在向我们展示的代码中的任何位置创建线程。只是在名称中使用“thread”创建变量并不能提供并行性。并且没有添加锁 - 如果你没有任何线程，所有锁可以做的就是放慢你的速度。
根据您的说明，听起来您尝试使用thread模块，而不是threading。有一个原因是thread文档的顶部告诉您不要使用它并使用threading代替。
你有一个锁保护你的线程数（根本不需要），但没有锁保护你的results。在大多数情况下，你会在Python中侥幸逃脱（因为上面提到的GIL问题相同 - 你的线程基本上不会同时运行，因此它们不会有比赛），但它仍然是一个非常糟糕的主意（特别是如果你不清楚那些“大多数情况”是什么）。

但是，run函数似乎基于for i in items:中pack循环的正文。如果这是一个很好的并行化的地方，那么你很幸运，因为在循环的每次迭代中创建一个并行任务正是futures和multiprocessing最擅长的。例如，此代码：

results = []
for i in items:
    result = dostuff(i)
    results.append(result)

......当然可以写成：

results = map(dostuff, items)

它可以简单地并行化，甚至不必理解未来的含义，如：

pool = concurrent.futures.ProcessPoolExecutor()
results = pool.map(dostuff, items)

在Python中使用线程进行for循环

1 个答案: