Question

我正在测试一些代码（试图让它更快但也试图理解差异）。我有一个循环，在内存中创建一个表。然后我尝试对它进行多处理，但是当我进行多处理时，内存使用情况似乎很奇怪。当我自己运行它时，表会不断增长和增长，直到占用系统上的所有内存，但是当我使用多处理时，它总是保持低位，这让我怀疑它的作用。我正在尝试快速重新创建未经过多处理的代码。

这是一些代码（只需添加/删除数据变量中的项目，使其运行得更快或更慢，以便查看系统进程。多处理位于顶部，非多处位于底部）：

from multiprocessing import Pool
from multiprocessing.managers import BaseManager, DictProxy
from collections import defaultdict

class MyManager(BaseManager):
    pass

MyManager.register('defaultdict', defaultdict, DictProxy)

def test(i,x, T):
    target_sum = 1000
    # T[x, i] is True if 'x' can be solved
    # by a linear combination of data[:i+1]
    #T = defaultdict(bool)           # all values are False by default
    T[0, 0] = True                # base case

    for s in range(target_sum + 1): #set the range of one higher than sum to include sum itself
            #print s
            for c in range(s / x + 1):  
                if T[s - c * x, i]:
                    T[s, i + 1] = True


data = [2,5,8,10,12,50]                
pool = Pool(processes=2)
mgr = MyManager()
mgr.start()
T = mgr.defaultdict(bool)
T[0, 0] = True 
for i, x in enumerate(data):    # i is index, x is data[i]
    pool.apply_async(test, (i,x, T))
pool.close()
pool.join()
pool.terminate()


print 'size of Table(with multiprocesing) is:', len(T)
count_of_true = []
for x in T.items():
    if T[x] == True:
       count_of_true.append(x)
print 'total number of true(with multiprocesing) is ', len(count_of_true)


#now lets try without multiprocessing
target_sum = 100
# T[x, i] is True if 'x' can be solved
# by a linear combination of data[:i+1]
T1 = defaultdict(bool)           # all values are False by default
T1[0, 0] = True                # base case


for i, x in enumerate(data):    # i is index, x is data[i]
    for s in range(target_sum + 1): #set the range of one higher than sum to include sum itself
            for c in range(s / x + 1):  
                if T1[s - c * x, i]:
                    T1[s, i + 1] = True

print 'size of Table(without multiprocesing) is ', len(T1)

count = []
for x in T1:
    if T1[x] == True:
        count.append(x)

print 'total number of true(without multiprocessing) is ', len(count)

作为一项实验，我将两段代码放入两个文件中并将它们并排运行。两个多数占20％左右，每个只占内存的0.5％。单个进程（没有多个）使用75％的内核和高达50％的内存使用率。

Answer 1

如果我理解你的代码是正确的，真正的问题是你不能用多处理来构建你的查找表。

此：

for i, x in enumerate(data):
    for s in range(target_sum + 1):
        for c in range(s / x + 1):  
            if T1[s - c * x, i]:
                T1[s, i + 1] = True

有效，因为你正在逐步完成它。

虽然这个：

def test(i,x, T):
    target_sum = 1000
    T[0, 0] = True
    for s in range(target_sum + 1):
        for c in range(s / x + 1):  
            if T[s - c * x, i]:
                T[s, i + 1] = True

# [...]

for i, x in enumerate(data):
    pool.apply_async(test, (i,x, T))

不会做同样的事情，因为你需要以前的结果才能构建新的结果，就像RecursivelyListAllThatWork()一样。

你的计数中还有一个错误：

for x in T.items():
    if T[x] == True:
       count_of_true.append(x)

应该是：

for x in T:
    if T[x] == True:
       count_of_true.append(x)

最好将True与is与==进行比较，即使在您的情况下您也不需要：{/ p>

for x in T:
    if T[x]:
       count_of_true.append(x)

另外，作为附注，您实际上并不需要defaultdict，因为I和others已经告诉过您。

多处理中的共享项目是否具有内存限制？

1 个答案: