Question

我想使用线性，完全连接层作为网络中的输入层之一。输入具有形状（batch_size，in_channels，num_samples）。它基于Tacotron论文：https://arxiv.org/pdf/1703.10135.pdf，是Enocder的prenet部分。在我看来，Chainer和PyTorch在Linear层上的实现方式有所不同-它们是否确实执行相同的操作，还是我误解了？

在PyTorch中，线性层的行为遵循以下文档：https://pytorch.org/docs/0.3.1/nn.html#torch.nn.Linear 据此，输入和输出数据的形状如下：

输入：（N，∗，in_features），其中*表示任意数量的附加尺寸

输出：（N，∗，out_features），除最后一个尺寸外，所有尺寸都与输入相同。

现在，让我们尝试在pytorch中创建线性图层并执行操作。我想要一个具有8个通道的输出，而输入数据将具有3个通道。

import numpy as np
import torch
from torch import nn
linear_layer_pytorch = nn.Linear(3, 8)

让我们创建一些形状为（1、4、3）的虚拟输入数据-（batch_size，num_samples，in_channels：

data = np.array([1, 1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 4], dtype=np.float32).reshape(1, 4, 3)
data_pytorch = torch.from_numpy(data)

最后执行操作：

results_pytorch = linear_layer_pytorch(data_pytorch)
results_pytorch.shape

输出的形状如下：Out[27]: torch.Size([1, 4, 8]) 看一下PyTorch实现的来源：

def linear(input, weight, bias=None):
    # type: (Tensor, Tensor, Optional[Tensor]) -> Tensor
    r"""
    Applies a linear transformation to the incoming data: :math:`y = xA^T + b`.

    Shape:

        - Input: :math:`(N, *, in\_features)` where `*` means any number of
          additional dimensions
        - Weight: :math:`(out\_features, in\_features)`
        - Bias: :math:`(out\_features)`
        - Output: :math:`(N, *, out\_features)`
    """
    if input.dim() == 2 and bias is not None:
        # fused op is marginally faster
        ret = torch.addmm(bias, input, weight.t())
    else:
        output = input.matmul(weight.t())
        if bias is not None:
            output += bias
        ret = output
    return ret

它将转置传递给它的权重矩阵，并沿batch_size轴进行广播，并执行矩阵乘法。考虑到线性层是如何工作的，我将其想象为8个节点，它们通过一个突触连接并保持权重，输入样本中的每个通道都具有3 * 8的权重。这正是我在调试器（8、3）中看到的形状。

现在，让我们跳到Chainer。可在以下位置找到Chainer的线性层文档：https://docs.chainer.org/en/stable/reference/generated/chainer.links.Linear.html#chainer.links.Linear。根据该文档， Linear 层包装了函数 linear ，根据文档，该函数将输入沿非批量尺寸展平，其权重矩阵的形状为(output_size, flattend_input_size)

import chainer 
linear_layer_chainer = chainer.links.Linear(8)
results_chainer = linear_layer_chainer(data)
results_chainer.shape
Out[21]: (1, 8)

将图层创建为linear_layer_chainer = chainer.links.Linear(3, 8)并调用它会导致大小不匹配。因此，在Chainer的情况下，我得到了完全不同的结果，因为这一次，我的加权矩阵的形状为（8，12），而我的结果的形状为（1，8）。所以现在，这是我的问题：由于结果明显不同，权重矩阵和输出具有不同的形状，如何使它们等效，期望的输出是什么？在Tacotron的PyTorch实现中，似乎PyTorch方法按原样使用（https://github.com/mozilla/TTS/blob/master/layers/tacotron.py）-Prenet。如果是这样，我如何使Chainer产生相同的结果（我必须在Chainer中实现）。我将不胜感激，很抱歉，这篇文章已经这么久了。

Answer 1

Chainer Linear图层（有点令人沮丧）不会将变换应用于最后一个轴。 Chainer将其余的轴弄平。相反，您需要提供多少个批处理轴，documentation在您的情况下为2：

# data.shape == (1, 4, 3)
results_chainer = linear_layer_chainer(data, n_batch_axes=2)
# 2 batch axes (1,4) means you apply linear to (..., 3)
# results_chainer.shape == (1, 4, 8)

您还可以使用l(data, n_batch_axes=len(data.shape)-1)始终应用于最后的尺寸，这是PyTorch，Keras等的默认行为。

线性层的PyTorch和Chainer实现是否等效？

1 个答案: