Question

我在MATLAB中实现了神经网络反向传播算法，但是没有正确训练。训练数据是矩阵X = [x1, x2]，维度2 x 200，我有一个目标矩阵T = [target1, target2]，维度2 x 200。对于第1类，T中的前100列可以是[1; -1]，对于第2类，T中的第二列可以是[-1; 1]。

theta = 0.1; % criterion to stop
eta = 0.1; % step size
Nh = 10;  % number of hidden nodes

由于某种原因，总训练误差始终为1.000，它永远不会接近theta，因此它会永远运行。

我使用了以下公式：

总训练错误：

下面详细记录了代码。我将不胜感激任何帮助。

clear;
close all;
clc;

%%('---------------------')
%%('Generating dummy data')
%%('---------------------')
d11 = [2;2]*ones(1,70)+2.*randn(2,70);
d12 = [-2;-2]*ones(1,30)+randn(2,30);
d1 = [d11,d12];

d21 = [3;-3]*ones(1,50)+randn([2,50]);
d22 = [-3;3]*ones(1,50)+randn([2,50]);
d2 = [d21,d22];

hw5_1 = d1;
hw5_2 = d2;

save hw5.mat hw5_1 hw5_2

x1 = hw5_1;
x2 = hw5_2;

% step 1: Construct training data matrix X=[x1,x2], dimension 2x200
training_data = [x1, x2];

% step 2: Construct target matrix T=[target1, target2], dimension 2x200
target1 = repmat([1; -1], 1, 100);  % class 1
target2 = repmat([-1; 1], 1, 100);  % class 2
T = [target1, target2];

% step 3: normalize training data
training_data = training_data - mean(training_data(:));
training_data = training_data / std(training_data(:));

% step 4: specify parameters
theta = 0.1; % criterion to stop 
eta   = 0.1; % step size 
Nh    = 10;  % number of hidden nodes, actual hidden nodes should be 11 (including a biase)
Ni    = 2;   % dimension of input vector = number of input nodes, actual input nodes should be 3 (including a biase)
No    = 2;   % number of class = number of out nodes

% step 5: Initialize the weights 
a = -1/sqrt(No);
b = +1/sqrt(No);
inputLayerToHiddenLayerWeight  = (b-a).*rand(Ni, Nh) + a
hiddenLayerToOutputLayerWeight = (b-a).*rand(Nh, No) + a

J = inf;

p = 1;

% activation function 
% f(net) = a*tanh(b*net), 
% f'(net) = a*b*sech2(b*net)
a = 1.716;
b = 2/3;

while J > theta

    % step 6: randomly choose one training sample vector from X, 
    % together with its target vector
    k = randi([1, size(training_data, 2)]);
    input_X = training_data(:,k);
    input_T = T(:,k);

    % step 7: Calculate net_j values for hidden nodes in layer 1 
    % hidden layer output before activation function applied
    netj = inputLayerToHiddenLayerWeight' * input_X;

    % step 8: Calculate hidden node output Y using activation function
    % apply activation function to hidden layer neurons
    Y = a*tanh(b*netj);

    % step 9: Calculate net_k values for output nodes in layer 2
    % output later output before activation function applied
    netk = hiddenLayerToOutputLayerWeight' * Y;

    % step 10: Calculate output node output Z using the activation function
    % apply activation function to the output layer neurons
    Z = a*tanh(b*netk);

    % step 11: Calculate sensitivity delta_k = (target - Z) * f'(Z) 
    % find the error between the expected_output and the neuron output
    % we got using the weights
    % delta_k = (expected - output) * activation(output)
    delta_k = [];
    for i=1:size(Z)
        yi = Z(i,:);
        expected_output = input_T(i,:);
        delta_k = [delta_k; (expected_output - yi) ...
                                * a*b*(sech(b*yi)).^2];
    end

    % step 12: Calculate sensitivity 
    % delta_j = Sum_k(delta_k * hidden-to-out weights) * f'(net_j)
    % error = (weight_k * error_j) * activation(output)
    delta_j = [];
    for j=1:size(Y)
        yi = Y(j,:);
        error = 0;
        for k=1:size(delta_k)
            error = error + delta_k(k,:)*hiddenLayerToOutputLayerWeight(j, k);
        end
        delta_j = [delta_j; error * (a*b*(sech(b*yi)).^2)];
    end

    % step 13: update weights

    %2x10
    inputLayerToHiddenLayerWeight = [];
    for i=1:size(input_X)
        xi = input_X(i,:);
        wji = [];
        for j=1:size(delta_j)
            wji = [wji, eta * xi * delta_j(j,:)];
        end
        inputLayerToHiddenLayerWeight = [inputLayerToHiddenLayerWeight; wji];
    end

    inputLayerToHiddenLayerWeight

    %10x2
    hiddenLayerToOutputLayerWeight = [];
    for j=1:size(Y)
        yi = Y(j,:);
        wjk = [];
        for k=1:size(delta_k)
            wjk = [wjk, eta * delta_k(k,:) * yi];
        end
        hiddenLayerToOutputLayerWeight = [hiddenLayerToOutputLayerWeight; wjk];
    end

    hiddenLayerToOutputLayerWeight

    % Mean Square Error
    J = 0;
    for j=1:size(training_data, 2)
        X = training_data(:,j);
        t = T(:,j);
        netj = inputLayerToHiddenLayerWeight' * X;
        Y = a*tanh(b*netj);
        netk = hiddenLayerToOutputLayerWeight' * Y;
        Z = a*tanh(b*netk);
        J = J + immse(t, Z);
    end

    J = J/size(training_data, 2)

    p = p + 1;
    if p == 4
        break;
    end
end

% testing neural network using the inputs
test_data = [[2; -2], [-3; -3], [-2; 5], [3; -4]];

for i=1:size(test_data, 2)

end

Answer 1

体重衰减对神经网络训练不是必不可少的。

我注意到的是您的功能规范化不正确。

将数据缩放到0到1范围的正确algorthim是

（max - x）/（max - min）

注意：您对数组（或向量）中的每个元素应用此选项。 NN的数据输入必须在[0,1]的范围内。（从技术上讲，他们可能会超出一点〜[-3,3]，但从0开始的价值使培训变得困难）

编辑*

我不知道这个激活功能

a = 1.716;
b = 2/3;
% f(net) = a*tanh(b*net), 
% f'(net) = a*b*sech2(b*net)

它就像是对tanh的变化。你能详细说明它是什么吗？

如果你仍然无法工作，请给我一个更新，我会更仔细地查看你的代码。

神经网络反向传播算法实现

1 个答案: