我在python中编码并使用mpi4py并行进行一些优化。我使用的是普通的最小二乘法,而且我的数据太大而无法放在一个处理器上,因此我有一个主进程,然后产生其他进程。这些子进程分别导入它们在整个优化过程中分别使用的数据部分。
我使用scipy.optimize.minimize进行优化,因此子进程从父进程接收系数猜测,然后向父进程报告平方误差和(SSE),然后scipy.optimize。最小化经历迭代,试图找到SSE的最小值。在最小化函数的每次迭代之后,父级向子进程广播新的系数猜测,子进程然后再次计算SSE。在子进程中,此算法在while循环中设置。在父进程中,我只需调用scipy.optimize.minimize。
在给我一个问题的部分,我正在进行嵌套优化或优化中的优化。内部优化是如上所述的OLS回归,然后外部优化最小化使用内部优化系数的另一个函数(OLS回归)。
所以在我的父进程中,我有两个最小化的函数,第二个函数调用第一个函数,并对第二个函数的优化的每次迭代进行新的优化。子进程有两个优化的嵌套while循环。
希望一切都有道理。如果需要更多信息,请告诉我。
以下是父流程的相关代码:
comm = MPI.COMM_SELF.Spawn(sys.executable,args = ['IVQTparallelSlave_cdf.py'],maxprocs=processes)
# First stage: reg D on Z, X
def OLS(betaguess):
comm.Bcast([betaguess,MPI.DOUBLE], root=MPI.ROOT)
SSE = np.array([0],dtype='d')
comm.Reduce(None,[SSE,MPI.DOUBLE], op=MPI.SUM, root = MPI.ROOT)
comm.Bcast([np.array([1],'i'),MPI.INT], root=MPI.ROOT)
return SSE
# Here is the CDF function.
def CDF(yguess, delta_FS, tau):
# Calculate W(y) in the slave process
# Solving the Reduced form after every iteration: reg W(y) on Z, X
comm.Bcast([yguess,MPI.DOUBLE], root=MPI.ROOT)
betaguess = np.zeros(94).astype('d')
###########
# This calculates the reduced form coefficient
coeffs_RF = scipy.minimize(OLS,betaguess,method='Powell')
# This little block is to get the slave processes to stop
comm.Bcast([betaguess,MPI.DOUBLE], root=MPI.ROOT)
SSE = np.array([0],dtype='d')
comm.Reduce(None,[SSE,MPI.DOUBLE], op=MPI.SUM, root = MPI.ROOT)
cont = np.array([0],'i')
comm.Bcast([cont,MPI.INT], root=MPI.ROOT)
###########
contCDF = np.array([1],'i')
comm.Bcast([contCDF,MPI.INT], root=MPI.ROOT) # This is to keep the outer while loop going
delta_RF = coeffs_RF.x[1]
return abs(delta_RF/delta_FS - tau)
########### This one finds Y(1) ##############
betaguess = np.zeros(94).astype('d')
######### First Stage: reg D on Z, X #########
coeffs_FS = scipy.minimize(OLS,betaguess,method='Powell')
print coeffs_FS
# This little block is to get the slave processes' while loops to stop
comm.Bcast([betaguess,MPI.DOUBLE], root=MPI.ROOT)
SSE = np.array([0],dtype='d')
comm.Reduce(None,[SSE,MPI.DOUBLE], op=MPI.SUM, root = MPI.ROOT)
cont = np.array([0],'i')
comm.Bcast([cont,MPI.INT], root=MPI.ROOT)
delta_FS = coeffs_FS.x[1]
######### CDF Function #########
yguess = np.array([3340],'d')
CDF1 = lambda yguess: CDF(yguess, delta_FS, tau)
y_minned_1 = scipy.minimize(CDF1,yguess,method='Powell')
以下是子进程的相关代码:
#IVQTparallelSlave_cdf.py
comm = MPI.Comm.Get_parent()
.
.
.
# Importing data. The data is the matrices D, and ZX
.
.
.
########### This one finds Y(1) ##############
######### First Stage: reg D on Z, X #########
cont = np.array([1],'i')
betaguess = np.zeros(94).astype('d')
# This corresponds to 'coeffs_FS = scipy.minimize(OLS,betaguess,method='Powell')' of the parent process
while cont[0]:
comm.Bcast([betaguess,MPI.DOUBLE], root=0)
SSE = np.array(((D - np.dot(ZX,betaguess).reshape(local_n,1))**2).sum(),'d')
comm.Reduce([SSE,MPI.DOUBLE],None, op=MPI.SUM, root = 0)
comm.Bcast([cont,MPI.INT], root=0)
if rank==0: print '1st Stage OLS regression done'
######### CDF Function #########
cont = np.array([1],'i')
betaguess = np.zeros(94).astype('d')
contCDF = np.array([1],'i')
yguess = np.array([0],'d')
# This corresponds to 'y_minned_1 = spicy.minimize(CDF1,yguess,method='Powell')'
while contCDF[0]:
comm.Bcast([yguess,MPI.DOUBLE], root=0)
# This calculates the reduced form coefficient
while cont[0]:
comm.Bcast([betaguess,MPI.DOUBLE], root=0)
W = 1*(Y<=yguess)*D
SSE = np.array(((W - np.dot(ZX,betaguess).reshape(local_n,1))**2).sum(),'d')
comm.Reduce([SSE,MPI.DOUBLE],None, op=MPI.SUM, root = 0)
comm.Bcast([cont,MPI.INT], root=0)
#if rank==0: print cont
comm.Bcast([contCDF,MPI.INT], root=0)
我的问题是,在经过外部最小化的一次迭代之后,它会吐出以下错误:
Internal Error: invalid error code 409e0e (Ring ids do not match) in MPIR_Bcast_impl:1328
Traceback (most recent call last):
File "IVQTparallelSlave_cdf.py", line 100, in <module>
if rank==0: print 'CDF iteration'
File "Comm.pyx", line 406, in mpi4py.MPI.Comm.Bcast (src/mpi4py.MPI.c:62117)
mpi4py.MPI.Exception: Other MPI error, error stack:
PMPI_Bcast(1478).....: MPI_Bcast(buf=0x2409f50, count=1, MPI_INT, root=0, comm=0x84000005) failed
MPIR_Bcast_impl(1328):
我还没有找到任何关于此信息的信息&#34; ring id&#34;错误或如何解决它。非常感谢帮助。谢谢!