在批量大小和时期方面,神经网络中学习过程的流程是什么?

时间:2018-05-18 04:49:18

标签: tensorflow machine-learning theano

我对神经网络中关于诸如批量大小,时期以及过程中权重分布等术语的事件感到困惑。

我想根据以下顺序验证我对流程的理解是否有效?

Considering one training/data point has 8 features(8 input nodes).
I have 20 training/data points.
I choose batch size of 2.
Now I want to make the model learn.

执行第一纪元

Executing first batch

    Data point-1 :8 feature's values go through the 8 input nodes.
        Random weights are initialised
        Forward Propagation happens
        Backward Propagation happens
        The result of backward propagation-all the weights are updated.

    Data point-2 :8 feature's values go through the 8 input nodes.
        Forward propagation happens with the updated weights found from the previous(aka Data point-1) back propagation result.
        Backward propagation happens and all the weights are again updated.

Executing second batch

    Data point-3 :8 features go through the 8 input nodes.
        Forward propagation happens with the updated nodes found from the previous(aka  Data point-2) back propagation result
        Backward propagation happens and all the weights are again updated.

This process continues………….until the first epoch ends

执行第二纪元

Executing the second batch
    Data point-1: 8 feature's value go through 8 input nodes.
        No random weights this time. Forward propagation happens with the last back-propagated values found(from the First Epoch, last executed batch)
        Backward propagation happens and all the weights are again updated.

This process continues.. until the second epoch ends.

这个过程一直持续到所需的时代。

3 个答案:

答案 0 :(得分:1)

mini-batch处理错误:对于批处理,我们一次计算整批的梯度,然后我们将所有梯度相加,然后每批更新一次权重。

以下是说明简单示例的渐变计算d(loss)/d(W)的代码:y = W * xmini-batch输入的single

X = tf.placeholder(tf.float32, [None, 1])
Y = tf.placeholder(tf.float32, [None, 1])

W1 = tf.constant([[0.2]], dtype=tf.float32)
out = tf.matmul(X, W1)

loss = tf.square(out-Y)
#Calculate error gradient with respect to weights.
gradients = tf.gradients(loss, W1)[0]

with tf.Session() as sess:
sess.run(tf.global_variables_initializer())

#Giving individual inputs
print(sess.run([gradients], {X: np.array([[0.1]]), Y:[[0.05]]})) 
# [[-0.006]]
print(sess.run([gradients], {X: np.array([[0.2]]), Y:[[0.1]]}))  
#[[-0.024]]

#Give a batch combining the above inputs
print(sess.run([gradients], {X: np.array([[0.1],[0.2]]), Y:[[0.05], [0.1]]}))   
# [[-0.03]] which is the sum of the above gradients.

答案 1 :(得分:0)

您提到的步骤是随机梯度下降,其中批量大小没有扮演任何角色..权重在每个数据点之后更新并用于下一个数据点评估。

对于批量大小= 2的小批量方案,它应该一起计算新的权重(通过backprop。),然后在下一批(大小为2)中使用它们并继续直到所有批次的高潮。你提到的其他事情是正确的。

答案 2 :(得分:0)

你几乎一切都正确,但反向传播权重更新。计算迷你批次中每个样品的误差,但只有在小批量的所有样品都经过前向传播后才会更新权重。您可以阅读更多相关信息here