Question

我正在使用以下代码在Matlab中使用随机梯度下降进行逻辑回归。培训+测试样本的总数约为600K。代码运行了几个小时。我怎样才能加快速度？

%% load dataset

clc;
clear;
load('covtype.mat');

Data = [X y];

%% Split into testing and training data in a 1:9 split

nRows=size(Data,1);
randRows=randperm(nRows);  % generate random ordering of row indices
Test=Data(randRows(1:58101),:);  % index using random order
Train=Data(randRows(58102:end),:);

Testx=Test(:,1:54);
Testy=Test(:,55:end);

Trainx=Train(:,1:54);
Trainy=Train(:,55:end);

%% Perform stochastic gradient descent on training data

lambda=0.01; % regularisation constant
alpha=0.01; % step length constant

theta_old = zeros(54,1);
theta_new = theta_old;
z=1;

for count = 1:size(Train,1)
    theta_old = theta_new;
    theta_new = theta_old + (alpha*Trainy(count)* (1.0 ./ (1.0 + exp(Trainy(count)*(Trainx(count,:)*theta_old)))).*Trainx(count,:))' - alpha*lambda*2*theta_old; %//'
    n = norm(theta_new);
    llr = lambda*n*n;
    count_dummy(z)=count; % dummy variable to store iteration number for plotting later
    % calculate log likelihood error for test data with current value of theta_new
    for i = 1:size(Test,1)
        llr = llr - 1.*log(1.0 + exp(-(Testy(i)*(Testx(i,:)*theta_new))));
    end
    llr_dummy(z)=llr; % dummy variable to store llr for plotting later
    z=z+1;

end
thetaopt = theta_new; % this is optimal theta


%% Plot results on testing data

plot(count_dummy, llr_dummy);

我必须在每次迭代时计算测试数据的对数似然误差来绘制它。我怎样才能加快这段代码的速度？

如何在Matlab中加速矢量方程？

0 个答案: