Question

当我们定义深度学习模型时，我们执行以下步骤：

指定如何根据输入和模型的参数计算输出。
指定成本（损失）功能。
通过最小化成本函数来搜索模型的参数。

在我看来，在MXNet中，前两个步骤是绑定的。例如，我通过以下方式定义线性变换：

# declare a symbolic variable for the model's input
inp = mx.sym.Variable(name = 'inp')
# define how output should be determined by the input
out = mx.sym.FullyConnected(inp, name = 'out', num_hidden = 2)

# specify input and model's parameters
x = mx.nd.array(np.ones(shape = (5,3)))
w = mx.nd.array(np.array([[1.0, 2.0, 3.0], [4.0, 5.0, 6.0]]))
b = mx.nd.array(np.array([7.0, 8.0]))

# calculate output based on the input and parameters
p = out.bind(ctx = mx.cpu(), args = {'inp':x, 'out_weight':w, 'out_bias':b})
print(p.forward()[0].asnumpy())

现在，如果我想在其上添加SoftMax转换，我需要执行以下操作：

# define the cost function
target = mx.sym.Variable(name = 'target')
cost = mx.symbol.SoftmaxOutput(out, target, name='softmax')

y = mx.nd.array(np.array([[1.0, 0.0], [0.0, 1.0], [1.0, 0.0], [1.0, 0.0], [0.0, 1.0]]))
c = cost.bind(ctx = mx.cpu(), args = {'inp':x, 'out_weight':w, 'out_bias':b, 'target':y})
print(c.forward()[0].asnumpy())

我不明白，为什么我们需要创建符号变量target。我们只有在想要计算成本时才需要它，但到目前为止，我们只是根据输入计算输出（通过线性变换和SoftMax）。

此外，我们需要为目标提供一个数值来计算输出。因此，看起来它是必需的，但它没有被使用（目标的提供值不会改变输出的值）。

最后，我们可以使用cost对象来定义一个模型，只要我们有数据就可以使用。但是成本函数怎么样？它必须被指定，但事实并非如此。基本上，看起来我只是因为使用SoftMax而被迫使用特定的成本功能。但为什么呢？

ADDED

有关更多统计/数学观点，请检查here。虽然目前的问题在性质上更加务实/程序化。基本上是：如何在MXNEt中解耦输出非线性和成本函数。例如，我可能想要进行线性变换，然后通过最小化绝对偏差而不是平方变量来找到模型参数。

Answer 1

如果您只想要softmax，可以使用mx.sym.softmax()。 mx.sym.SoftmaxOutput()包含用于计算交叉熵梯度的有效代码（负对数损失），这是softmax使用的最常见损耗。如果您想使用自己的损失，只需使用softmax并在训练期间在顶部添加损失。我应该注意，如果您真的想要，也可以在推理期间用简单的SoftmaxOutput替换softmax图层。

为什么MXNet中绑定了成本函数和上次激活函数？

1 个答案: