Question

我正在尝试用RELU实现神经网络。

输入图层 - ＆gt; 1个隐藏层 - ＆gt; relu - ＆gt;输出层 - ＆gt; softmax层

以上是我的神经网络的架构。我很担心这个relu的反向传播。对于RELU的导数，如果x <= 0，则输出为0。如果x> 0，输出为1。因此，当你计算梯度时，这是否意味着如果x <= 0？

，我会消除梯度

有人可以一步一步地解释我的神经网络架构的反向传播吗？

Answer 1

如果x <= 0，则输出为0.如果x> 0，输出为1

ReLU功能定义为：对于x＆gt; 0输出为x，即 f（x）= max（0，x）

因此，对于导数f'（x），它实际上是：

如果x＆lt; 0，输出为0.如果x> 0，输出为1.

未定义导数f'（0）。所以它通常设置为0或者你将激活函数修改为f（x）= max（e，x）的小e。

通常：ReLU是使用整流器激活功能的单元。这意味着它的工作原理与任何其他隐藏层完全相同，但除了tanh（x），sigmoid（x）或您使用的任何激活之外，您将使用f（x）= max（0，x）。

如果您已经为使用sigmoid激活的多层网络编写了代码，那么它实际上只有一行变化。关于前向或后向传播的任何内容都不会在算法上发生变化。如果你还没有更简单的模型工作，那就回过头来开始吧。否则你的问题不是关于ReLUs，而是关于整体实施NN。

Answer 2

这是一个很好的例子，使用ReLU实现XOR：参考，http://pytorch.org/tutorials/beginner/pytorch_with_examples.html

# -*- coding: utf-8 -*-
import numpy as np
import matplotlib.pyplot as plt

# N is batch size(sample size); D_in is input dimension;
# H is hidden dimension; D_out is output dimension.
N, D_in, H, D_out = 4, 2, 30, 1

# Create random input and output data
x = np.array([[0, 0], [0, 1], [1, 0], [1, 1]])
y = np.array([[0], [1], [1], [0]])

# Randomly initialize weights
w1 = np.random.randn(D_in, H)
w2 = np.random.randn(H, D_out)

learning_rate = 0.002
loss_col = []
for t in range(200):
    # Forward pass: compute predicted y
    h = x.dot(w1)
    h_relu = np.maximum(h, 0)  # using ReLU as activate function
    y_pred = h_relu.dot(w2)

    # Compute and print loss
    loss = np.square(y_pred - y).sum() # loss function
    loss_col.append(loss)
    print(t, loss, y_pred)

    # Backprop to compute gradients of w1 and w2 with respect to loss
    grad_y_pred = 2.0 * (y_pred - y) # the last layer's error
    grad_w2 = h_relu.T.dot(grad_y_pred)
    grad_h_relu = grad_y_pred.dot(w2.T) # the second laye's error 
    grad_h = grad_h_relu.copy()
    grad_h[h < 0] = 0  # the derivate of ReLU
    grad_w1 = x.T.dot(grad_h)

    # Update weights
    w1 -= learning_rate * grad_w1
    w2 -= learning_rate * grad_w2

plt.plot(loss_col)
plt.show()

有关ReLU衍生物的更多信息，请参阅此处：http://kawahara.ca/what-is-the-derivative-of-relu/