I am trying to use the CBIS-DDSM dataset to classify malignant or benign breast tumours with PCA and SVM.
However, my results are astonishingly bad, and I have been working my head of the last week, trying to get better and more saying results. My testset is misweighted, the benign cases occupy 63% of the cases, and my accuracy, wihtout weights, reaches a maximum of 63% with different C and Gamma values. When I try to weigh the misbalance, I get an accuracy around 50%, basically a useless model.
I am using the RBF kernel, with gamma and c values around 1. My test and train data can be found in libsvm format here:
TestSet with 50 components from PCA的xpath div类 TrainSet with 50 components from PCA
我的数据只是无用的,还是SVM做错了什么?我的训练/验证码可以在这里查看:
trainingSet = SVMProblemHelper.Load(Configuration.Get("
testSet = SVMProblemHelper.Load(Configuration.Get("TestSetLocation"));
trainingSet = trainingSet.Normalize(SVMNormType.L2);
testSet = testSet.Normalize(SVMNormType.L2);
trainingSet.Save("trainset-normalized.txt");
testSet.Save("testset-normalized.txt");
SVMParameter parameter = new SVMParameter
{
Type = SVMType.C_SVC,
Kernel = SVMKernelType.RBF,
C = double.Parse(Configuration.Get("C")), // trying different values for both c and gamma, around 0.01 to 1000
Gamma = double.Parse(Configuration.Get("Gamma")),
Probability = false,
WeightLabels = new int[] {0, 1},
Weights = new double[] {1-0.63, 0.63} // have tried with and without these weights
};
Console.WriteLine("training svm");
double[] crossValidationResults; // output labels
int nFold = int.Parse(Configuration.Get("nFold"));
trainingSet.CrossValidation(parameter, nFold, out crossValidationResults);
// Evaluate the cross validation result
// If it is not good enough, select the parameter set again
double crossValidationAccuracy = trainingSet.EvaluateClassificationProblem(crossValidationResults);
// Train the model, If your parameter set gives good result on cross validation
SVMModel model = trainingSet.Train(parameter);
// Save the model
SVM.SaveModel(model, Configuration.Get("ModelLocation"));
// Predict the instances in the test set
double[] testResults = testSet.Predict(model);
// Evaluate the test results
double testAccuracy =
testSet.EvaluateClassificationProblem(testResults, model.Labels, out var confusionMatrix);
// Print the resutls
Console.WriteLine("\n\nCross validation accuracy: " + crossValidationAccuracy);
Console.WriteLine("\nTest accuracy: " + testAccuracy);
Console.WriteLine("\nConfusion matrix:\n");
我尝试使用Extremes PCA,Accords PCA和我们自己的算法,但没有任何变化。我们输入的图片看起来不错。
编辑:我可能会补充说,PCA的输入是将肿瘤的100x100标准化作物进行灰度标准化,又将pca拟合为长度为10,000的载体。