Question

我正在尝试在MATLAB中使用反向传播实现随机梯度下降算法，以训练神经网络来学习XOR函数。但是，当我运行算法时（现在暂时不使用随机的迷你批处理和历元来检查权重/偏差是否更新），随着时间的流逝，输出层的激活趋向于零，而我认为它将开始学习正确的激活。这是因为省略了小批处理和新纪元吗？还是在我的代码中实现算法有问题？

% Initialise the weights and bias
w2 = rand(2,2);
w3 = rand(2,1);
b2 = rand(2,1);
b3 =  rand(1,1);

% Initialise eta and lambda
eta = 2.5;
lambda = 0.5;

% Inputs to the system
AFull = rand(1,100)>.5;
BFull = rand(1,100)>.5;
SizeA = size(AFull);
I = [AFull;BFull];

% Desired outputs
y = xor(AFull,BFull);

% Loop to run through each input and iterate the weights and biases based
% on graident decent with a quadratic cost function and L2 regularisation
for j=1:SizeA(2)

    % First hidden layer activation
    z21 = w2(1,1)*(I(1,j))+w2(1,2)*(I(2,j))-b2(1);
    z22 = w2(2,1)*(I(1,j))+w2(2,2)*(I(2,j))-b2(2);
    a21 = 1/(1+exp(-z21));
    a22 = 1/(1+exp(-z22));

    % Output layer activation
    zL = w3(1)*a21 + w3(2)*a22 - b3;
    aL = 1/(1+exp(-zL));
    % Checking the output activation
    active(j)=aL;

    deltaL = (aL-y(j))*(exp(zL)/((exp(zL)+1)^2));

    delta21 = (w3(1)*deltaL)*(exp(zL)/((exp(zL)+1)^2));

    delta22 = (w3(2)*deltaL)*(exp(zL)/((exp(zL)+1)^2));

    % The partial derivatives

    dCb3 = deltaL;

    dCb2 = delta21;

    dCb1 = delta22;

    dCw31 = a21*deltaL;

    dCw32 = a22*deltaL;

    dCw211 = I(1)*delta21;

    dCw212 = I(2)*delta22;

    dCw221 = I(1)*delta21;

    dCw222 = I(2)*delta22;

    % Updating the weights and biases
    w2 = (1-eta*lambda)*w2 - eta*[dCw211 dCw212;dCw221 dCw222];
    w3 = (1-eta*lambda)*w3 - eta*[dCw31 dCw32];

    b2 = b2 - eta*[dCb1;dCb2];

    b3 = b3 - eta*dCb3 ;

end```

为什么对于我的MATLAB神经网络训练代码中的所有输入，我的激活层都趋于零？

0 个答案: