如何使用交叉验证(使用accord.Net)生成我的训练集?

时间:2017-05-17 04:36:13

标签: c# decision-tree cross-validation id3 accord.net

我试图了解如何使用交叉验证来训练决策树算法。我试图使用Accord .Net框架的简单网球示例来应用它:

DataTable data = new DataTable("Mitchell's Tennis Example");
            data.Columns.Add(new DataColumn("Day"));
            data.Columns.Add(new DataColumn("Outlook"));
            data.Columns.Add(new DataColumn("Temperature"));
            data.Columns.Add(new DataColumn("Humidity"));
            data.Columns.Add(new DataColumn("Wind"));
            data.Columns.Add(new DataColumn("PlayTennis"));
            data.Rows.Add("D1", "Sunny", "Hot", "High", "Weak", "No");
            data.Rows.Add("D2", "Sunny", "Hot", "High", "Strong", "No");
            data.Rows.Add("D3", "Overcast", "Hot", "High", "Weak", "Yes");
            data.Rows.Add("D4", "Rain", "Mild", "High", "Weak", "Yes");
            data.Rows.Add("D5", "Rain", "Cool", "Normal", "Weak", "Yes");
            data.Rows.Add("D6", "Rain", "Cool", "Normal", "Strong", "No");
            data.Rows.Add("D7", "Overcast", "Cool", "Normal", "Strong", "Yes");
            data.Rows.Add("D8", "Sunny", "Mild", "High", "Weak", "No");
            data.Rows.Add("D9", "Sunny", "Cool", "Normal", "Weak", "Yes");
            data.Rows.Add("D10", "Rain", "Mild", "Normal", "Weak", "Yes");
            data.Rows.Add("D11", "Sunny", "Mild", "Normal", "Strong", "Yes");
            data.Rows.Add("D12", "Overcast", "Mild", "High", "Strong", "Yes");
            data.Rows.Add("D13", "Overcast", "Hot", "Normal", "Weak", "Yes");
            data.Rows.Add("D14", "Rain", "Mild", "High", "Strong", "No");
            Codification codebook = new Codification(data);
            DecisionVariable[] attributes =
              {
                new DecisionVariable("Outlook",   3), // 3 possible values (Sunny, overcast, rain)
                new DecisionVariable("Temperature", 3), // 3 possible values (Hot, mild, cool)  
                new DecisionVariable("Humidity",    2), // 2 possible values (High, normal)    
                new DecisionVariable("Wind",        2)  // 2 possible values (Weak, strong)
              };


            DataTable symbols = codebook.Apply(data);
            int[][] inputs = symbols.ToArray<int>("Outlook", "Temperature", "Humidity", "Wind");
            int[] outputs = symbols.ToIntArray("PlayTennis").GetColumn(0);

            var crossvalidation = new CrossValidation(size: data.Rows.Count, folds: 7);
            //var crossvalidation = new CrossValidation<DecisionTree>(size: data.Rows.Count, folds: 7);

            crossvalidation.Fitting = delegate (int k, int[] indicesTrain, int[] indicesValidation)
            {
                var trainingInputs = inputs.Submatrix(indicesTrain);
                var trainingOutputs = outputs.Submatrix(indicesTrain);
                var validationInputs = inputs.Submatrix(indicesValidation);
                var validationOutputs = outputs.Submatrix(indicesValidation);
                int classCount = 2; // 2 possible output values for playing tennis: yes or no
                DecisionTree tree = new DecisionTree(attributes, classCount);

                var id3learning = new ID3Learning(tree);
                double delD = id3learning.Run(inputs, outputs);
                double trainingError = id3learning.ComputeError(trainingInputs, trainingOutputs);
                double validationError = id3learning.ComputeError(validationInputs, validationOutputs);
                return new CrossValidationValues<object>(tree, trainingError, validationError);
                //return new CrossValidationValues<DecisionTree>(tree, trainingError, validationError);


            };
            var result = crossvalidation.Compute();
            result.Save(@"file.txt");

            // Finally, access the measured performance.
            double trainingErrors = result.Training.Mean;
            double validationErrors = result.Validation.Mean;
            Console.ReadKey();

一切似乎都有效,除了我不知道如何使用交叉验证的结果。有没有办法可以看到结果或使用它们来生成更好的训练集? (假设我将在更多数据上应用相同的方法)

0 个答案:

没有答案