Question

我正试图在tensorflow中实现LeNet-5，如http://yann.lecun.com/exdb/publis/pdf/lecun-01a.pdf中所述。

我在定义C3时遇到了一些麻烦（7页最后一段 - 第8页第一段），因为我不知道如何具体告诉网络哪些来自S2的功能图连接到C3（即我只知道如何连接所有功能图。）

我的代码是：

def LeNet(x):
    # Hyperparameters for initliazitation of weights
    mu = 0
    sigma = 0.1

    #This is the first convolutional layer C1
    #Initialize weights for the first convolutional layer. 6 feature maps connected to
    #one (1) 5x5 neighborhood in the input. 5*5*1*6=150 trainable parameters
    C1_w = tf.Variable(tf.truncated_normal(shape = [5,5,1,6],mean = mu, stddev = sigma))
    #Bias for each feature map. 6 parameters, with the weights we have 156 parameters
    C1_b = tf.Variable(tf.zeros(6))
    #Define the convolution layer with the weights and biases defined.
    C1 = tf.nn.conv2d(x, C1_w, strides = [1,1,1,1], padding = 'VALID') + C1_b
    #LeCun uses a sigmoidal activation function here.

    #This is the sub-sampling layer S2
    #Subsampling (also known as average pooling) with 2x2 receptive fields. 12 parameters.
    S2 = tf.nn.avg_pool(C1, ksize = [1,2,2,1], strides = [1,2,2,1], padding = 'VALID')
    #The result is passed to a sigmoidal function
    S2 = tf.nn.sigmoid(S2)

    #Another convolutional layer C3.
    #Initlialize weights. 16 feature maps connected connected to 5*5 neighborhoods
    C3_w = tf.Variable(tf.truncated_normal(shape = [5,5,6,16], mean = mu, stddev = sigma)) #This is the line I would want to change.
    C3_b = tf.Variable(tf.zeros(16))

正确知道代码正在运行（当然附加了其他代码，只显示了重要部分），但我没有按照论文描述的那样做，我想更密切地关注它。我在C3中有5x5x6x16 = 2400 + 16 = 2416个可训练参数，网络应该有1516个可训练参数。

也许可以将C3_w定义为一个矩阵，其中一些值是tf.constants而另一些是tf.Variables？怎么会这样做？

更新＃1：

好的，我正在尝试使用示例中的split函数。我想做以下事情：

split1, split2 = tf.split(C3_w, [10, 6], axis=1)

也就是说，沿着第一维分开[10,6]（因为我的张量是[5,5,6,16]。但这显示了这些错误：

ValueError: Sum of output sizes must match the size of the original Tensor along the split dimension or the sum of the positive sizes must be less if it contains a -1 for 'split' (op: 'SplitV') with input shapes: [5,5,6,16], [2], [] and with computed input tensors: input[1] = <10 6>, input[2] = <1>.

更新＃2

即使更新＃1中的代码有效，我想我也不会实现本文所述的程序。我会将“第一”10个连接放在那个维度上并丢弃“下一个”6。这不是本文的工作方式（参见第8页的表I，有点复杂。

Answer 1

根据需要，使用tf.split将要素图分割为多个变量。然后你有单独的变量进入下一个适当的层。 Backprop将通过这样的操作完美地工作。

我不知道论文的细节，但是如果你要在一个轨道中处理整个特征地图，并将分割特征地图输入到其他轨道，它将同样有效，所有这些场景都可以工作非常好。

https://www.tensorflow.org/api_docs/python/tf/split

TensorFlow：实现网络，其中所有（如果一个图层的要素图）未连接到下一个图层的所有要素图

1 个答案: