Question

我正在尝试使用C＃构建自定义的机器学习库，我对该主题进行了充分的研究。我的第一个示例（XOR估算器）取得了成功，我能够将平均损失降低到几乎为零。然后我尝试建立一个模型来对手写数字进行分类（使用MNIST文本数据库）。问题是，无论我如何配置模型，我总是会在数据集上陷入一定的平均损失。第二个问题，因为MNIST数据集非常大，模型需要大量时间来计算，也许我可以就如何进行算法中最慢的部分使用一些建议（我正在使用随机梯度下降法）。我将展示主要方法完成大部分工作。

我尝试使用MSE和CrossEntropy损失函数，也使用tanh，sigmoid，reLu和softPlus激活函数。我要构建的模型是4层的。第一层，784个输入神经元；二是乙状神经元16个；第三，S型和输出层为16个神经元，S型为10个神经元（一个热编码数字）。我知道下面的代码可能不是一个最小的可重现示例，但是它代表了我要尝试的算法。我还将解决方案上传到GitHub，也许有人可以帮我解决问题。这是链接https://github.com/juan-carvajal/MachineLearningFramework 首先运行应用程序的Main方法，然后执行运行良好的XOR分类器。然后是MNIST分类器。

最好在此表示模型：

            DataSet dataSet = new DataSet("mnist2.txt", ' ', 10, false);
            //This creates a model with batching=128 , learningRate=0.5 and 
            //CrossEntropy loss function
            var p = new Perceptron(128, 0.5, ErrorFunction.CrossEntropy())
            .Layer(784, ActivationFunction.Sigmoid())
            .Layer(16, ActivationFunction.Sigmoid())
            .Layer(16, ActivationFunction.Sigmoid())
            .Layer(10, ActivationFunction.Sigmoid());
            //1000 is the number of epochs
            p.Train2(dataSet, 1000);

实际算法（随机梯度下降）：

Console.WriteLine("Initial Loss:"+ CalculateMeanErrorOverDataSet(dataSet));
            for (int i = 0; i < epochs; i++)
            {   
                //Shuffle the data in every step
                dataSet.Shuffle();
                List<DataRow> batch = dataSet.NextBatch(this.Batching);
                //Gets random batch from the dataSet
                int count = 0;
                    foreach (DataRow example in batch)
                    {

                    count++;

                    double[] result = this.FeedForward(example.GetFeatures());
                        double[] labels = example.GetLabels();
                        if (result.Length != labels.Length)
                        {
                            throw new Exception("Inconsistent array size, Incorrect implementation.");
                        }
                        else
                        {
                            //What follows is the calculation of the gradient for this example, every example affects the current gradient, then all those changes are averaged an every parameter is updated.
                            double error = CalculateExampleLost(example);


                            for (int l = this.Layers.Count - 1; l > 0; l--)
                            {
                                if (l == this.Layers.Count - 1)
                                {
                                    for (int j = 0; j < this.Layers[l].CostDerivatives.Length; j++)
                                    {
                                    this.Layers[l].CostDerivatives[j] = ErrorFunction.GetDerivativeValue(labels[j], this.Layers[l].Activations[j]);

                                    }

                                }
                                else
                                {
                                    for (int j = 0; j < this.Layers[l].CostDerivatives.Length; j++)
                                    {

                                        double acum = 0;
                                        for (int j2 = 0; j2 < Layers[l + 1].Size; j2++)
                                        {
                                            acum += Layers[l + 1].WeightMatrix[j2, j] * this.Layers[l+1].ActivationFunction.GetDerivativeValue(Layers[l + 1].WeightedSum[j2]) * Layers[l + 1].CostDerivatives[j2];
                                        }
                                        this.Layers[l].CostDerivatives[j] = acum;
                                    }
                                }

                                for (int j = 0; j < this.Layers[l].Activations.Length; j++)
                                {
                                    this.Layers[l].BiasVectorChangeRecord[j] += this.Layers[l].ActivationFunction.GetDerivativeValue(Layers[l].WeightedSum[j]) * Layers[l].CostDerivatives[j];
                                    for (int k = 0; k < Layers[l].WeightMatrix.GetLength(1); k++)
                                    {
                                        this.Layers[l].WeightMatrixChangeRecord[j, k] += Layers[l - 1].Activations[k]
                                            * this.Layers[l].ActivationFunction.GetDerivativeValue(Layers[l].WeightedSum[j])
                                            * Layers[l].CostDerivatives[j];
                                    }
                                }
                            }
                        }



                    }
                    TakeGradientDescentStep(batch.Count);

                if ((i + 1) % (epochs / 10) == 0)
                {
                    Console.WriteLine("Epoch " + (i + 1) + ", Avg.Loss:" + CalculateMeanErrorOverDataSet(dataSet));
                }
            }

这是当前模型中局部最小值的示例。在我的研究中，我发现类似的模型可以将精度提高到90％。我的模特只赚了10％。

平均损失停留在局部最小值上（使用C＃开发的Perceptron）

0 个答案: