将置换数据放入LibSVM预先计算的内核中

时间:2014-05-05 10:05:13

标签: matlab kernel permutation libsvm

我目前正在进行非常简单的SVM分类。我在LibSVM中使用RBF和DTW预先计算的内核。

当我计算相似度(内核)矩阵时,在计算内核矩阵之前,一切似乎都很好......直到我置换数据。

SVM当然对输入数据的排列不变。在下面的Matlab代码中,标有“< - !!!!!!!!!!”的行决定分类准确度(不是置换:100% - 置换:0%到100%,取决于rng的种子)。但是为什么置换文件字符串数组(名为fileList)有什么不同呢?我究竟做错了什么?我是否误解了“置换不变性”的概念,还是我的Matlab代码存在问题?

我的csv文件格式为:LABEL,val1,val2,...,valN,所有csv文件都存储在文件夹dirName中。因此,字符串数组包含条目'10_0.csv 10_1.csv .... 11_7.csv,11_8.csv'(未置换)或置换时的其他顺序。

我也试图对样本序列号的矢量进行置换,但这没有区别。

function [SimilarityMatrixTrain, SimilarityMatrixTest, trainLabels, testLabels, PermSimilarityMatrixTrain, PermSimilarityMatrixTest, permTrainLabels, permTestLabels] = computeDistanceMatrix(dirName, verificationClass, trainFrac)
fileList = getAllFiles(dirName);
fileList = fileList(1:36);
trainLabels = [];
testLabels = [];
trainFiles = {};
testFiles = {};
permTrainLabels = [];
permTestLabels = [];
permTrainFiles = {};
permTestFiles = {};

n = 0;
sigma = 0.01;

trainFiles = fileList(1:2:end);
testFiles = fileList(2:2:end);

rng(3);
permTrain = randperm(length(trainFiles))
%rng(3); <- !!!!!!!!!!!
permTest = randperm(length(testFiles));

permTrainFiles = trainFiles(permTrain)
permTestFiles = testFiles(permTest);

noTrain = size(trainFiles);
noTest = size(testFiles);

SimilarityMatrixTrain = eye(noTrain);
PermSimilarityMatrixTrain = (noTrain);
SimilarityMatrixTest = eye(noTest);
PermSimilarityMatrixTest = eye(noTest);

% UNPERM
%Train
for i = 1 : noTrain
    x = csvread(trainFiles{i});   
    label = x(1);
    trainLabels = [trainLabels, label];
    for j = 1 : noTrain
        y = csvread(trainFiles{j});            
        dtwDistance = dtwWrapper(x(2:end), y(2:end));
        rbfValue = exp((dtwDistance.^2)./(-2*sigma));
        SimilarityMatrixTrain(i, j) = rbfValue;
        n=n+1
    end
end

SimilarityMatrixTrain = [(1:size(SimilarityMatrixTrain, 1))', SimilarityMatrixTrain];

%Test
for i = 1 : noTest
    x = csvread(testFiles{i});
    label = x(1);
    testLabels = [testLabels, label];
    for j = 1 : noTest
        y = csvread(testFiles{j});            
        dtwDistance = dtwWrapper(x(2:end), y(2:end));
        rbfValue = exp((dtwDistance.^2)./(-2*sigma));
        SimilarityMatrixTest(i, j) = rbfValue;
        n=n+1
    end
end

SimilarityMatrixTest = [(1:size(SimilarityMatrixTest, 1))', SimilarityMatrixTest];

% PERM
%Train
for i = 1 : noTrain
    x = csvread(permTrainFiles{i});        
    label = x(1);
    permTrainLabels = [permTrainLabels, label];
    for j = 1 : noTrain
        y = csvread(permTrainFiles{j});            
        dtwDistance = dtwWrapper(x(2:end), y(2:end));
        rbfValue = exp((dtwDistance.^2)./(-2*sigma));
        PermSimilarityMatrixTrain(i, j) = rbfValue;
        n=n+1
    end
end

PermSimilarityMatrixTrain = [(1:size(PermSimilarityMatrixTrain, 1))', PermSimilarityMatrixTrain];

%Test
for i = 1 : noTest
    x = csvread(permTestFiles{i});
    label = x(1);
    permTestLabels = [permTestLabels, label];
    for j = 1 : noTest
        y = csvread(permTestFiles{j});            
        dtwDistance = dtwWrapper(x(2:end), y(2:end));
        rbfValue = exp((dtwDistance.^2)./(-2*sigma));
        PermSimilarityMatrixTest(i, j) = rbfValue;
        n=n+1
    end
end

PermSimilarityMatrixTest = [(1:size(PermSimilarityMatrixTest, 1))', PermSimilarityMatrixTest];

mdlU = svmtrain(trainLabels', SimilarityMatrixTrain, '-t 4 -c 0.5');
mdlP = svmtrain(permTrainLabels', PermSimilarityMatrixTrain, '-t 4 -c 0.5');

[pclassU, xU, yU] = svmpredict(testLabels', SimilarityMatrixTest, mdlU);
[pclassP, xP, yP] = svmpredict(permTestLabels', PermSimilarityMatrixTest, mdlP);

xU    
xP

end

我会非常感谢任何答案!

此致 本杰明

1 个答案:

答案 0 :(得分:0)

在清理完代码并让我的同事看了之后,我们/他终于发现了这个错误。当然,我必须从训练测试样本计算测试矩阵(让SVM通过使用训练向量的alpha值乘积的和来预测测试数据(它们是非支持向量为零))。希望这能为你们任何人澄清问题。为了更清楚,请参阅下面的修订代码。但是,例如在using precomputed kernels with libsvm中,有一只眼睛锐利的人也可以看到用火车和测试向量计算测试矩阵。如果您有任何进一步的评论/问题/提示,请随意对此帖发表评论或/和答案!

function [tacc, testacc, mdl, SimilarityMatrixTrain, SimilarityMatrixTest, trainLabels, testLabels] = computeSimilarityMatrix(dirName)
fileList = getAllFiles(dirName);
fileList = fileList(1:72);
trainLabels = [];
testLabels = [];
trainFiles = {};
testFiles = {};   
n = 0;
sigma = 0.01;

trainFiles = fileList(1:2:end);
testFiles = fileList(2:5:end);

noTrain = size(trainFiles);
noTest = size(testFiles);

permTrain = randperm(noTrain(1));
permTest = randperm(noTest(1));

trainFiles = trainFiles(permTrain);
testFiles = testFiles(permTest);

%Train
for i = 1 : noTrain(1)
    x = csvread(trainFiles{i});
    label = x(1);
    trainlabel = label;
    trainLabels = [trainLabels, label];
    for j = 1 : noTrain(1)
        y = csvread(trainFiles{j});
        dtwDistance = dtwWrapper(x(2:end), y(2:end));
        rbfValue = exp((dtwDistance.^2)./(-2*sigma.^2));
        SimilarityMatrixTrain(i, j) = rbfValue;
    end
end

SimilarityMatrixTrain = [(1:size(SimilarityMatrixTrain, 1))', SimilarityMatrixTrain];

%Test
for i = 1 : noTest(1)
    x = csvread(testFiles{i});
    label = x(1);
    testlabel = label;
    testLabels = [testLabels, label];
    for j = 1 : noTrain(1)
        y = csvread(trainFiles{j});     
        dtwDistance = dtwWrapper(x(2:end), y(2:end));
        rbfValue = exp((dtwDistance.^2)./(-2*sigma.^2));
        SimilarityMatrixTest(i, j) = rbfValue;

    end
end

SimilarityMatrixTest = [(1:size(SimilarityMatrixTest, 1))', SimilarityMatrixTest];

mdlU = svmtrain(trainLabels', SimilarityMatrixTrain, '-t 4 -c 1000 -q');
fprintf('TEST: '); [pclassU, xU, yU] = svmpredict(testLabels', SimilarityMatrixTest, mdlU);
fprintf('TRAIN: ');[pclassT, xT, yT] = svmpredict(trainLabels', SimilarityMatrixTrain, mdlU);

tacc = xT(1);
testacc = xU(1);
mdl = mdlU;

end

此致 本杰明