Question

我使用libsvm进行分类。我使用交叉验证来调整参数C和gamma。没有。我用于交叉验证的观察结果大约是6000~7000。但是matlab需要花费大量时间来调整参数。是因为数据集的大小还是我需要优化代码？

代码示例：

[labels,data] = libsvmread('newwndwlibfeatures.txt');

labels_stem=labels(labels==1);
feature_stem=data(labels==1,:);
labels_nostem=labels(labels~=1);
feature_nostem=data(labels~=1,:);
L=randperm(length(labels_nostem));
labels_nostem=labels_nostem(L);
feature_nostem=feature_nostem(L,:);
labelscv=[labels_stem; labels_nostem(1:round(.05*length(labels_nostem)))];
featurecv=[feature_stem; feature_nostem(1:round(.05*length(labels_nostem)),:)];
weight=[length(labels_stem)/(length(labels_stem)+round(.05*length(labels_nostem)))  ...
        round(.05*length(labels_nostem))/(length(labels_stem)+round(.05*length(labels_nostem)))];

[C,gamma] = meshgrid(-15:1:10, -15:1:6);
% 
folds=5;
%# grid search, and cross-validation
cv_acc = zeros(numel(C),1);

for i=1:numel(C)
    cv_acc(i) = svmtrain(labelscv, featurecv, ...
                    sprintf('-c %f -g %f -h 0 -v %d -w0 %d -w1 %d', 2^C(i), 2^gamma(i), folds,weight));
end

Answer 1

您的数据集大小不是问题所在。你正在严格搜索525种可能性的空间5次。如果每次折叠需要几秒钟，您就会看到完成的时间。（25行* 21列 5倍 2秒/ 60秒）我会考虑使用更智能的优化方法，而不仅仅是检查每个组合。

另外，如果我没记错的话：当我完成论文时，我遇到了同样的问题，并且C的一些值使得训练成倍地增长。

如何优化libsvm matlab的交叉验证？

1 个答案: