Useless SVM model, is my data useless or am i using libsvm wrong?

时间:2019-05-15 09:45:40

标签: c# classification pca libsvm libsvmsharp

I am trying to use the CBIS-DDSM dataset to classify malignant or benign breast tumours with PCA and SVM.

However, my results are astonishingly bad, and I have been working my head of the last week, trying to get better and more saying results. My testset is misweighted, the benign cases occupy 63% of the cases, and my accuracy, wihtout weights, reaches a maximum of 63% with different C and Gamma values. When I try to weigh the misbalance, I get an accuracy around 50%, basically a useless model.

I am using the RBF kernel, with gamma and c values around 1. My test and train data can be found in libsvm format here:

TestSet with 50 components from PCA的xpath div类 TrainSet with 50 components from PCA

我的数据只是无用的,还是SVM做错了什么?我的训练/验证码可以在这里查看:

            trainingSet = SVMProblemHelper.Load(Configuration.Get("
            testSet = SVMProblemHelper.Load(Configuration.Get("TestSetLocation"));
            trainingSet = trainingSet.Normalize(SVMNormType.L2);
            testSet = testSet.Normalize(SVMNormType.L2);

            trainingSet.Save("trainset-normalized.txt");
            testSet.Save("testset-normalized.txt");

            SVMParameter parameter = new SVMParameter
            {
                Type = SVMType.C_SVC,
                Kernel = SVMKernelType.RBF,
                C = double.Parse(Configuration.Get("C")), // trying different values for both c and gamma, around 0.01 to 1000
                Gamma = double.Parse(Configuration.Get("Gamma")),
                Probability = false,
                WeightLabels = new int[] {0, 1},
                Weights = new double[] {1-0.63, 0.63} // have tried with and without these weights
            };

            Console.WriteLine("training svm");

            double[] crossValidationResults; // output labels
            int nFold = int.Parse(Configuration.Get("nFold"));
            trainingSet.CrossValidation(parameter, nFold, out crossValidationResults);

            // Evaluate the cross validation result
            // If it is not good enough, select the parameter set again
            double crossValidationAccuracy = trainingSet.EvaluateClassificationProblem(crossValidationResults);

            // Train the model, If your parameter set gives good result on cross validation
            SVMModel model = trainingSet.Train(parameter);

            // Save the model
            SVM.SaveModel(model, Configuration.Get("ModelLocation"));

            // Predict the instances in the test set
            double[] testResults = testSet.Predict(model);


            // Evaluate the test results
            double testAccuracy =
                testSet.EvaluateClassificationProblem(testResults, model.Labels, out var confusionMatrix);

            // Print the resutls
            Console.WriteLine("\n\nCross validation accuracy: " + crossValidationAccuracy);
            Console.WriteLine("\nTest accuracy: " + testAccuracy);
            Console.WriteLine("\nConfusion matrix:\n");

我尝试使用Extremes PCA,Accords PCA和我们自己的算法,但没有任何变化。我们输入的图片看起来不错。

编辑:我可能会补充说,PCA的输入是将肿瘤的100x100标准化作物进行灰度标准化,又将pca拟合为长度为10,000的载体。

0 个答案:

没有答案