我正在使用此代码进行5倍交叉验证:
%# read some training data
[labels,data] = libsvmread('Training_Data_libsvmFormat.txt');
%# grid of parameters
folds = 5;
[C,gamma] = meshgrid(-5:2:15, -15:2:3) %Coarse Grid Search: bestC = 8 bestGamma = 2
%[C,gamma] = meshgrid(1:0.5:4, -1:0.25:3) %Fine Grid Search: bestC = 4 bestGamma = 2
%# grid search, and cross-validation
cv_acc = zeros(numel(C),1);
for i=1:numel(C)
cv_acc(i) = svmtrain(labels, data, sprintf('-c %f -g %f -v %d', 2^C(i), 2^gamma(i), folds));
end
%# pair (C,gamma) with best accuracy
[~,idx] = max(cv_acc);
%# contour plot of paramter selection
contour(C, gamma, reshape(cv_acc,size(C))), colorbar
hold on
plot(C(idx), gamma(idx), 'rx')
text(C(idx), gamma(idx), sprintf('Acc = %.2f %%',cv_acc(idx)), 'HorizontalAlign','left', 'VerticalAlign','top')
hold off
xlabel('log_2(C)'), ylabel('log_2(\gamma)'), title('Cross-Validation Accuracy')
%# now you can train you model using best_C and best_gamma
best_C = 2^C(idx);
best_gamma = 2^gamma(idx);
现在,我知道在5倍交叉验证中,4/5的数据集用于训练,1/5用于测试,并且一直在改变测试部分以获得RBF的最佳交叉C和γ。但是,在数据集中,前1000个示例为正数,而后1000个示例均为负数。使用svmtrain()进行交叉验证是否会改变数据,或者可能是测试的1/5包含所有负面示例的情况?我问这个问题好像它没有改变数据,准确性是不现实的。
我感谢您的帮助。