Question

我已经注意到，非线性求解器中的剩余范数相对于将分布式（并行）组件耦合到非分布式（串行）组件时所使用的处理器数量有一点依赖性。我在下面附加了一个示例脚本。

'''
Simple example coupling a serial and distributed ImplicitComponent
'''

import numpy as np

import openmdao.api as om
from mpi4py import MPI
from openmdao.utils.array_utils import evenly_distrib_idxs

rank = MPI.COMM_WORLD.rank

size = 3
A = np.array([[1.0, 8.0, 0.0], [-1.0, 10.0, 2.0], [3.0, 100.5, 1.0]])

'''
This component solves the following quadratic equation in parallel:
    a_i0 * y_i^2 + a_i1 * y_i + a_i2 = x_i
    for i = {0,1,2}
where the coefficients are the components of the matrix A
'''
class DistribQuadtric(om.ImplicitComponent):
    def initialize(self):
        self.options['distributed'] = True
        self.options.declare('size', types=int, default=1,
            desc="Size of input and output vectors.")

    def setup(self):
        comm = self.comm
        rank = comm.rank

        size_total = self.options['size']

        # Distribute x and y vectors across each processor as evenly as possible
        sizes, offsets = evenly_distrib_idxs(comm.size, size_total)
        start = offsets[rank]
        end = start + sizes[rank]
        self.size_local = size_local = sizes[rank]

        # Get the local slice of A that this processor will be working with
        self.A_local = A[start:end,:]

        self.add_input('x', np.ones(size_local, float),
                       src_indices=np.arange(start, end, dtype=int))

        self.add_output('y', np.ones(size_local, float))

    def apply_nonlinear(self, inputs, outputs, residuals):
        x = inputs['x']
        y = outputs['y']
        r = residuals['y']
        for i in range(self.size_local):
            r[i] = self.A_local[i, 0] * y[i]**2 + self.A_local[i, 1] * y[i] \
            + self.A_local[i, 2] - x[i]

    def solve_nonlinear(self, inputs, outputs):
        x = inputs['x']
        y = outputs['y']
        for i in range(self.size_local):
            a = self.A_local[i, 0]
            b = self.A_local[i, 1]
            c = self.A_local[i, 2] - x[i]
            y[i] = (-b + np.sqrt(b**2 - 4*a*c))/(2*a)

'''
This component solves the following linear equation in serial:
    Ax = y
'''
class SerialLinear(om.ImplicitComponent):
    def initialize(self):

        self.options.declare('size', types=int, default=1,
                             desc="Size of input and output vectors.")

    def setup(self):
        size = self.options['size']

        self.add_input('y', np.ones(size, float))

        self.add_output('x', np.ones(size, float))

        self.A = A

    def apply_nonlinear(self, inputs, outputs, residuals):
        y = inputs['y']
        x = outputs['x']
        r = residuals['x']
        r = y - A.dot(x)

    def solve_nonlinear(self, inputs, outputs):
        y = inputs['y']
        x = outputs['x']
        x[:] = np.linalg.inv(A).dot(y)

# Create a couple problem between the linear and quadratic components
prob = om.Problem()
top_group = prob.model
top_group.add_subsystem("distributed_quad", DistribQuadtric(size=size))
top_group.add_subsystem("serial_linear", SerialLinear(size=size))

# Connect variables between components
top_group.connect('serial_linear.x', 'distributed_quad.x')
top_group.connect('distributed_quad.y', 'serial_linear.y')

# Need a nonlinear solver since the model is coupled
top_group.nonlinear_solver = om.NonlinearBlockGS(iprint=2, maxiter=20)

# Setup problem
prob.setup()

# Solver problem
prob.run_model()

# Print out solution
if prob.comm.rank == 0:
    print('x', prob['serial_linear.x'])
    print('y', prob['serial_linear.y'])

在1个处理器上运行此代码时，打印输出如下：

NL: NLBGS 0 ; 2.35754338 1
NL: NLBGS 1 ; 0.256315721 0.108721529
NL: NLBGS 2 ; 0.036527896 0.0154940504
NL: NLBGS 3 ; 0.00641965062 0.00272302545
NL: NLBGS 4 ; 0.0011292331 0.000478987198
NL: NLBGS 5 ; 0.000198654857 8.42635002e-05
NL: NLBGS 6 ; 3.49479079e-05 1.48238663e-05
NL: NLBGS 7 ; 6.14814792e-06 2.60786205e-06
NL: NLBGS 8 ; 1.08160237e-06 4.58783657e-07
NL: NLBGS 9 ; 1.90279057e-07 8.0710734e-08
NL: NLBGS 10 ; 3.34745201e-08 1.41988989e-08
NL: NLBGS 11 ; 5.8889481e-09 2.49791717e-09
NL: NLBGS 12 ; 1.03600386e-09 4.3944212e-10
NL: NLBGS 13 ; 1.8225669e-10 7.7307884e-11
NL: NLBGS Converged
('x', array([-0.01251987,  0.00136932, -0.11111688]))
('y', array([-0.00156529, -0.19602066, -0.01105954]))

但是在3个处理器上运行时，打印输出为：

NL: NLBGS 0 ; 5.66931072 1
NL: NLBGS 1 ; 0.6855401 0.120921243
NL: NLBGS 2 ; 0.0993351375 0.0175215546
NL: NLBGS 3 ; 0.0174731006 0.00308205026
NL: NLBGS 4 ; 0.00307353315 0.000542135243
NL: NLBGS 5 ; 0.00054069662 9.537255e-05
NL: NLBGS 6 ; 9.51208366e-05 1.67782013e-05
NL: NLBGS 7 ; 1.67339624e-05 2.95167495e-06
NL: NLBGS 8 ; 2.94389363e-06 5.19268351e-07
NL: NLBGS 9 ; 5.17899477e-07 9.1351401e-08
NL: NLBGS 10 ; 9.11105862e-08 1.60708401e-08
NL: NLBGS 11 ; 1.60284752e-08 2.82723526e-09
NL: NLBGS 12 ; 2.81978416e-09 4.97376895e-10
NL: NLBGS 13 ; 4.96064272e-10 8.74999266e-11
NL: NLBGS Converged
('x', array([-0.01251987,  0.00136932, -0.11111688]))
('y', array([-0.00156529, -0.19602066, -0.01105954]))

尽管耦合问题的最终解决方案是相同的，但非线性求解中使用的残差范数随着处理器数量的增加而增长。当将分布式组件耦合到分布式组件或将非分布式组件耦合到非分布式组件时，只有在混合它们时才不会发生这种情况。

我相信造成这种差异的原因在于用于并行问题并在OpenMDAO源代码中定义的底层petsc_vector类。具体来说，该类的规范定义如下所示：

    def get_norm(self):
        """
        Return the norm of this vector.

        Returns
        -------
        float
            norm of this vector.
        """
        return self._system.comm.allreduce(np.linalg.norm(self._data))

此方法使用allreduce将所有处理器上的向量分量累加到范数中。虽然这将为为分布式组件定义的任何向量提供正确的结果（因为向量的组件在所有处理器中均已分解），但串行组件的向量在每个处理器上都包含该向量的相同副本，因此会在其中多次计数该规范取决于所使用的处理器数量。

尽管在我展示的示例中这种影响很小，但对于可以在大量处理器上运行的更复杂的模型而言，这种影响会越来越大。这可能会导致收敛，并行可校正性研究和解决公差问题。通常有办法避免这种问题吗？

Answer 1

感谢您的举报。这绝对是一个错误，并且已在OpenMDAO存储库中的最新版本（0d50e7e2c26140b603460f2324e3d1d95513264a）中修复。

最新版本（2.8）也包含此修复程序。

耦合分布式和串行组件时残差不正确

1 个答案: