Question

我正在尝试在 MNIST 数字数据集上实现 softmax 回归。我正在使用批量 GD，因此成本应该会逐渐下降。这是我得到的结果

cost after epoch 1 :  [2.63035001]

/opt/conda/lib/python3.7/site-packages/ipykernel_launcher.py:3: RuntimeWarning: overflow encountered in exp
  This is separate from the ipykernel package so we can avoid doing imports until

cost after epoch 2 :  [29.10684701]
cost after epoch 3 :  [12.43702583]
cost after epoch 4 :  [2.302654]
cost after epoch 5 :  [2.30265079]
cost after epoch 6 :  [2.30264759]
cost after epoch 7 :  [2.3026444]
cost after epoch 8 :  [2.30264121]
cost after epoch 9 :  [2.30263803]
cost after epoch 10 :  [2.30263485]

如果我只将学习率更改为 0.01，我会得到以下结果：

cost after epoch 1 :  [2.63039004]

/opt/conda/lib/python3.7/site-packages/ipykernel_launcher.py:3: RuntimeWarning: overflow encountered in exp
  This is separate from the ipykernel package so we can avoid doing imports until
/opt/conda/lib/python3.7/site-packages/ipykernel_launcher.py:11: RuntimeWarning: divide by zero encountered in log
  # This is added back by InteractiveShellApp.init_path()
/opt/conda/lib/python3.7/site-packages/ipykernel_launcher.py:11: RuntimeWarning: invalid value encountered in multiply
  # This is added back by InteractiveShellApp.init_path()

cost after epoch 2 :  [nan]
cost after epoch 3 :  [115.76138438]
cost after epoch 4 :  [2.30384942]
cost after epoch 5 :  [2.30379418]
cost after epoch 6 :  [2.30374]
cost after epoch 7 :  [2.30368689]
cost after epoch 8 :  [2.30363481]
cost after epoch 9 :  [2.30358374]
cost after epoch 10 :  [2.30353368]

我怀疑这是因为梯度爆炸。我为我的激活函数尝试了 np.clip()，但没有帮助。

def sigmoid(matrix):
    s = np.clip( matrix, -500, 500 )
    s = 1 / (1 + np.exp(-matrix))
    return s

def relu(matrix):
    matrix = np.clip(matrix, -500, 500)
    matrix = matrix * (matrix > 0)
    return matrix

我使用 layer_dims = [784, 512, 256, 128, 64, 10] 初始化，其中 784 用于 28x28 像素图片。

def he_init(layer_dims):
    parameters = {}
    L = len(layer_dims) 
    for i in range(1, L):
        parameters['w' + str(i)] = np.random.randn(layer_dims[i], layer_dims[i - 1]) * np.sqrt(2 / layer_dims[i - 1])
        parameters['b' + str(i)] = zero_init(layer_dims[i])
    return parameters

我使用 np.exp() 和 np.log() 函数的唯一地方是 sigmoid、softmax 函数和成本。

def softmax(z): 
    softmax_matrix = np.exp(z) / np.sum(np.exp(z), axis = 0, keepdims=True)
    return softmax_matrix

def softmax_cost(aL, y): # aL: The last layer, y: labels
    loss = np.sum(-y * np.log(aL), axis=0, keepdims=True)
    cost = (1 / y.shape[1]) * np.sum(loss, axis=1)
    return cost

我仍然认为我有梯度爆炸问题，但我不知道我能做些什么来解决它。我在 kaggle notebook here

上有完整记录的代码版本，其中包含输入维度和变量定义

任何修复我的模型的建议将不胜感激。谢谢！

Softmax 从零开始在 MNIST 爆炸梯度上

0 个答案: