
时间:2017-08-10 17:04:54

标签: python neural-network backpropagation

我的数据是xor门的4123行输入和输出。 我想写一个带有三个输入层神经元的神经网络(第三个是偏置),一个隐藏层和一个输出层。


import numpy as np

class TwoLayerNetwork:
    def __init__(self, input_size, hidden_size, output_size):
            input_size:  the number of neurons in the input layer
            hidden_size: the number of neurons in the hidden layer
            output_size: the number of neurons in the output layer
        self.input_size = input_size
        self.hidden_size = hidden_size
        self.output_size = output_size

        self.params = {}
        self.params['W1'] = 0.01 * np.random.randn(input_size, hidden_size)  # FxH
        self.params['b1'] = np.zeros((hidden_size, 1)) # Hx1
        self.params['W2'] = 0.01 * np.random.randn(hidden_size, output_size)  # HxO
        self.params['b2'] = np.zeros((output_size, 1))  # Ox1

        self.optimal_weights = []
        self.errors = {}

    def train(self, X, y, epochs):
            X: input data matrix, NxF
            y: output vector, Nx1

               the optimal set of parameters that best minimize the loss function

        W1, b1 = self.params['W1'], self.params['b1']
        W2, b2 = self.params['W2'], self.params['b2']

        for iteration in range(epochs):

            forward_to_hidden = X.dot(W1)  # NxH
            activate_hidden = sigmoid(forward_to_hidden)  # NxH
            forward_to_output = activate_hidden.dot(W2)  # NxO
            output = sigmoid(forward_to_output)  # NxO

            self.errors[iteration] = np.mean(0.5 * (y**2 - output**2))

            output_error = y - output  # NxO

            output_layer_delta = output_error * sigmoidPrime(output)  # NxO
            hidden_layer_error = output_layer_delta.dot(W2.T)  # NxO . OxH = NxH
            hidden_layer_delta = hidden_layer_error * sigmoidPrime(activate_hidden) # NxH

            W1_update = X.T.dot(hidden_layer_delta)  # FxN . NxH = FxH
            W2_update = activate_hidden.T.dot(output_layer_delta)  # HxN . NxO = HxO

            W1 += W1_update
            W2 += W2_update


    def predict(self, X):
        W1, W2 = self.optimal_weights[0], self.optimal_weights[1]

        forward = sigmoid(X.dot(W1))  # NxH
        forward = forward.dot(W2)  # NxO
        forward = sigmoid(forward) # NxO

        return forward

def sigmoid(x):
    return 1 / (1 + np.exp(-x))
def sigmoidPrime(x):
    return sigmoid(x) * (1 - sigmoid(x))



enter image description here




import pandas as pd

data = pd.read_csv('xor.csv').sample(frac=1)
X = data.iloc[:, [0, 1]]  # 1st and 2nd cols are the input
X = np.hstack((X, np.ones((data.shape[0], 1))))  # adding the bias 1's
y = data.iloc[:, 2][:, np.newaxis]  # 3rd col is the output

from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25)

nn.train(X_train, y_train, 100)

plt.plot(range(100), [i for i in nn.errors.values()])

The link for the dataset

1 个答案:

答案 0 :(得分:0)


有趣的是,您的错误规范很奇怪。 我做到了 self.errors[iteration] = np.mean(0.5 * (y - output)**2) 用于可视化。 x轴表示纪元,y轴表示错误: enter image description here

所以会发生什么,反向传播达到稳定状态,然后迅速炸掉重量。为了减缓重量的爆炸并让网络有一段时间重新评估其错误,你可以添加一个所谓的学习率" != 1.这解决了其中一个陷阱。

另一个是第二个数字:你在更新中遇到振荡行为,程序永远不会达到最佳状态。为了解决这个问题,你可以故意以动量"形式进入不完美状态。 enter image description here


enter image description here



last_update = np.zeros((X.shape[1], W1.shape[1]))
last_update2 = np.zeros((W1.shape[1], W2.shape[1]))
        output_layer_delta = output_error * sigmoidPrime(forward_to_output)  # NxO
        hidden_layer_delta = hidden_layer_error * sigmoidPrime(forward_to_hidden) # NxH

        W1 += 0.001*(W1_update + last_update * 0.5)
        W2 += 0.001*(W2_update + last_update2 * 0.5)
#            W1 = 0.001*W1_update
#            W2 = 0.001*W2_update
        last_update = W1_update.copy()
        last_update2 = W2_update.copy()

对我来说是最后一招。现在请验证并安抚这个吵闹的男人,他花了很多时间来计算它。 ;)