Question

我为一个项目写了一个vanilla神经网络，并在着名的MNIST数据集上训练它。它的准确率达到了85％左右，我想知道为什么这比MNIST网站上描述的神经网络所达到的95.3％的成功率要低得多，该网站上的应该与我的相似。

有关神经网络的更多细节：

- 输入是28x28图像中每个像素的784灰度值（0-255）

- 图层大小为784-256-10，偏置节点的固定值为1，连接到所有隐藏节点和输出节点。

- 它使用sigmoid激活功能

- 输出节点的错误定义为它的值与预期值的平方之差。

- 每次迭代训练算法，通过60,000个训练案例中的每一个，通过梯度下降训练神经网络。

- 通常在大约15次迭代之后收敛到大约84％-88％的测试案例中的准确分类百分比。

- 尽管我所阅读的所有文献都表明这是必要的，但是当我有偏置节点时，神经网络的表现会比我评论它时更糟（我可能做错了，lmk）。

- 测试和训练案例存储为对象，这些对象将值和预期输出保存在所有案例的数组中。

以下是该计划的主要操作代码：

结构的初始化

   /**
   * A method to initialize the structure of the neural net
   * It adds a node for every input location needed, then it adds the hidden layer
   * connects the hidden layer to the input layer, makes the output layer, and connects the output layer 
   * to the hidden layer 
   */
   public void initializeStructure()
   {
      for(int ii = 0; ii < numInputs; ii++)
      {
         inputLayer.add(new Node());
      }   
      // add an extra node to the input layer. This node has a default value of one, so 
      // it acts like a bias to every node it is connected to (the bias has the magnitude of the weight)
      // this means that the only thing we need to edit are weights(including these ones), not values
      Node bias = new Node();
      bias.value = 1.0;
      inputLayer.add(bias);   
      for(int ii = 0; ii < layerSize; ii++)
      {
         Node currentNode = new Node();
         for(Node n : inputLayer)
         {                              
            currentNode.connections.add(new Connection(numInputs,n,currentNode,false));
         }         
         hiddenLayer.add(currentNode);
      }  
      for(int ii = 0; ii < 10; ii++)
      {
         Node output = new Node();
         for(Node n : hiddenLayer)
         {
            output.connections.add(new Connection(numInputs,n,output,true)); 
         }     
         output.connections.add(new Connection(numInputs,bias,output,true));     
         outputLayer.add(output);
      }      
   }

训练算法

   public void train()
   {
      for(InputCase current : trainCases)
      {
         int[] expected  = new int[10];
         for(int jj = 0; jj < expected.length; jj++)
         {
            if(jj == current.expectedOutput)
               expected[jj] = 1;
            else
               expected[jj] = 0;
         }
         for(int jj = 0; jj < current.values.length; jj++)
         {
         inputLayer.get(jj).value = current.values[jj];
         }
         run();
         double sum = 0;
         for(int ii = 0; ii < outputLayer.size(); ii++)
         {
            Node n = outputLayer.get(ii);
            for(Connection c : n.connections)
            {
               c.weight -= learningRate * c.origin.value * c.destination.value * (1-c.destination.value) * (c.destination.value - expected[ii]);
               sum += c.destination.value * (1-c.destination.value) * (c.destination.value - expected[ii]) * c.weight;
            }
         }
         for(Node n : hiddenLayer)
         {
            for(Connection c : n.connections)
            {
               c.weight -= learningRate * c.origin.value * c.destination.value * (1-c.destination.value) * sum;
            }
         }
      }
   }

如果您还有其他问题，或者想要/需要查看更多代码，请说出来！

如何通过MNIST改进神经网络？

0 个答案: