我对神经网络中关于诸如批量大小,时期以及过程中权重分布等术语的事件感到困惑。
我想根据以下顺序验证我对流程的理解是否有效?
Considering one training/data point has 8 features(8 input nodes).
I have 20 training/data points.
I choose batch size of 2.
Now I want to make the model learn.
执行第一纪元
Executing first batch
Data point-1 :8 feature's values go through the 8 input nodes.
Random weights are initialised
Forward Propagation happens
Backward Propagation happens
The result of backward propagation-all the weights are updated.
Data point-2 :8 feature's values go through the 8 input nodes.
Forward propagation happens with the updated weights found from the previous(aka Data point-1) back propagation result.
Backward propagation happens and all the weights are again updated.
Executing second batch
Data point-3 :8 features go through the 8 input nodes.
Forward propagation happens with the updated nodes found from the previous(aka Data point-2) back propagation result
Backward propagation happens and all the weights are again updated.
This process continues………….until the first epoch ends
执行第二纪元
Executing the second batch
Data point-1: 8 feature's value go through 8 input nodes.
No random weights this time. Forward propagation happens with the last back-propagated values found(from the First Epoch, last executed batch)
Backward propagation happens and all the weights are again updated.
This process continues.. until the second epoch ends.
这个过程一直持续到所需的时代。
答案 0 :(得分:1)
mini-batch
处理错误:对于批处理,我们一次计算整批的梯度,然后我们将所有梯度相加,然后每批更新一次权重。
以下是说明简单示例的渐变计算d(loss)/d(W)
的代码:y = W * x
和mini-batch
输入的single
:
X = tf.placeholder(tf.float32, [None, 1])
Y = tf.placeholder(tf.float32, [None, 1])
W1 = tf.constant([[0.2]], dtype=tf.float32)
out = tf.matmul(X, W1)
loss = tf.square(out-Y)
#Calculate error gradient with respect to weights.
gradients = tf.gradients(loss, W1)[0]
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
#Giving individual inputs
print(sess.run([gradients], {X: np.array([[0.1]]), Y:[[0.05]]}))
# [[-0.006]]
print(sess.run([gradients], {X: np.array([[0.2]]), Y:[[0.1]]}))
#[[-0.024]]
#Give a batch combining the above inputs
print(sess.run([gradients], {X: np.array([[0.1],[0.2]]), Y:[[0.05], [0.1]]}))
# [[-0.03]] which is the sum of the above gradients.
答案 1 :(得分:0)
您提到的步骤是随机梯度下降,其中批量大小没有扮演任何角色..权重在每个数据点之后更新并用于下一个数据点评估。
对于批量大小= 2的小批量方案,它应该一起计算新的权重(通过backprop。),然后在下一批(大小为2)中使用它们并继续直到所有批次的高潮。你提到的其他事情是正确的。
答案 2 :(得分:0)
你几乎一切都正确,但反向传播权重更新。计算迷你批次中每个样品的误差,但只有在小批量的所有样品都经过前向传播后才会更新权重。您可以阅读更多相关信息here