我正在测试一些代码(试图让它更快但也试图理解差异)。我有一个循环,在内存中创建一个表。然后我尝试对它进行多处理,但是当我进行多处理时,内存使用情况似乎很奇怪。当我自己运行它时,表会不断增长和增长,直到占用系统上的所有内存,但是当我使用多处理时,它总是保持低位,这让我怀疑它的作用。我正在尝试快速重新创建未经过多处理的代码。
这是一些代码(只需添加/删除数据变量中的项目,使其运行得更快或更慢,以便查看系统进程。多处理位于顶部,非多处位于底部):
from multiprocessing import Pool
from multiprocessing.managers import BaseManager, DictProxy
from collections import defaultdict
class MyManager(BaseManager):
pass
MyManager.register('defaultdict', defaultdict, DictProxy)
def test(i,x, T):
target_sum = 1000
# T[x, i] is True if 'x' can be solved
# by a linear combination of data[:i+1]
#T = defaultdict(bool) # all values are False by default
T[0, 0] = True # base case
for s in range(target_sum + 1): #set the range of one higher than sum to include sum itself
#print s
for c in range(s / x + 1):
if T[s - c * x, i]:
T[s, i + 1] = True
data = [2,5,8,10,12,50]
pool = Pool(processes=2)
mgr = MyManager()
mgr.start()
T = mgr.defaultdict(bool)
T[0, 0] = True
for i, x in enumerate(data): # i is index, x is data[i]
pool.apply_async(test, (i,x, T))
pool.close()
pool.join()
pool.terminate()
print 'size of Table(with multiprocesing) is:', len(T)
count_of_true = []
for x in T.items():
if T[x] == True:
count_of_true.append(x)
print 'total number of true(with multiprocesing) is ', len(count_of_true)
#now lets try without multiprocessing
target_sum = 100
# T[x, i] is True if 'x' can be solved
# by a linear combination of data[:i+1]
T1 = defaultdict(bool) # all values are False by default
T1[0, 0] = True # base case
for i, x in enumerate(data): # i is index, x is data[i]
for s in range(target_sum + 1): #set the range of one higher than sum to include sum itself
for c in range(s / x + 1):
if T1[s - c * x, i]:
T1[s, i + 1] = True
print 'size of Table(without multiprocesing) is ', len(T1)
count = []
for x in T1:
if T1[x] == True:
count.append(x)
print 'total number of true(without multiprocessing) is ', len(count)
作为一项实验,我将两段代码放入两个文件中并将它们并排运行。两个多数占20%左右,每个只占内存的0.5%。单个进程(没有多个)使用75%的内核和高达50%的内存使用率。
答案 0 :(得分:2)
如果我理解你的代码是正确的,真正的问题是你不能用多处理来构建你的查找表。
此:
for i, x in enumerate(data):
for s in range(target_sum + 1):
for c in range(s / x + 1):
if T1[s - c * x, i]:
T1[s, i + 1] = True
有效,因为你正在逐步完成它。
虽然这个:
def test(i,x, T):
target_sum = 1000
T[0, 0] = True
for s in range(target_sum + 1):
for c in range(s / x + 1):
if T[s - c * x, i]:
T[s, i + 1] = True
# [...]
for i, x in enumerate(data):
pool.apply_async(test, (i,x, T))
不会做同样的事情,因为你需要以前的结果才能构建新的结果,就像RecursivelyListAllThatWork()
一样。
你的计数中还有一个错误:
for x in T.items():
if T[x] == True:
count_of_true.append(x)
应该是:
for x in T:
if T[x] == True:
count_of_true.append(x)
最好将True
与is
与==
进行比较,即使在您的情况下您也不需要:{/ p>
for x in T:
if T[x]:
count_of_true.append(x)