神经网络MNIST

时间:2017-10-09 18:48:12

标签: python numpy machine-learning neural-network mnist

我一直在研究神经网络一段时间,并用python和numpy实现了一个实现。我用XOR做了一个非常简单的例子,效果很好。所以我想我会更进一步尝试MNIST数据库。

有我的问题。我使用的NN有784个输入,30个隐藏和10个输出神经元。 隐藏层的激活功能只吐出一个,因此网络基本上停止学习。我正在做的数学运算是正确的,并且相同的实现与XOR示例一起运行良好,我正在正确地读取MNIST集。所以我不知道问题的来源。

import pickle
import gzip

import numpy as np

def load_data():
    f = gzip.open('mnist.pkl.gz', 'rb')
    training_data, validation_data, test_data = pickle.load(f, encoding="latin1")
    f.close()
    return (training_data, validation_data, test_data)

def transform_output(num):
    arr = np.zeros(10)
    arr[num] = 1.0
    return arr

def out2(arr):
    return arr.argmax()


data = load_data()
training_data = data[0]
training_input = np.array(training_data[0])
training_output = [transform_output(y) for y in training_data[1]]

batch_size = 10

batch_count = int(np.ceil(len(training_input) / batch_size))

input_batches = np.array_split(training_input, batch_count)
output_batches = np.array_split(training_output, batch_count)

#Sigmoid Function
def sigmoid (x):
    return 1.0/(1.0 + np.exp(-x))

#Derivative of Sigmoid Function
def derivatives_sigmoid(x):
    return x * (1.0 - x)

#Variable initialization
epoch=1 #Setting training iterations
lr=2.0 #Setting learning rate
inputlayer_neurons = len(training_input[0]) #number of features in data set
hiddenlayer_neurons = 30 #number of hidden layers neurons

output_neurons = len(training_output[0]) #number of neurons at output layer

#weight and bias initialization
wh=np.random.uniform(size=(inputlayer_neurons,hiddenlayer_neurons))
bh=np.random.uniform(size=(1,hiddenlayer_neurons))
wout=np.random.uniform(size=(hiddenlayer_neurons,output_neurons))
bout=np.random.uniform(size=(1,output_neurons))

for i in range(epoch):
    for batch in range(batch_count):

        X = input_batches[batch]
        y = output_batches[batch]

        zh1 = np.dot(X, wh)
        zh = zh1 + bh

        # data -> hidden neurons -> activations
        ah = sigmoid(zh)

        zo1 = np.dot(ah, wout)
        zo = zo1 + bout

        output = sigmoid(zo)

        # data -> output neurons -> error
        E = y - output

        print("debugging")
        print("X")
        print(X)
        print("WH")
        print(wh)
        print("zh1")
        print(zh1)
        print("bh")
        print(bh)
        print("zh")
        print(zh)
        print("ah")
        print(ah)
        print("wout")
        print(wout)
        print("zo1")
        print(zo1)
        print("bout")
        print(bout)
        print("zo")
        print(zo)
        print("out")
        print(output)
        print("y")
        print(y)
        print("error")
        print(E)
        # data -> output neurons -> slope
        slope_out = derivatives_sigmoid(output)

        # data -> output neurons -> change of error
        d_out = E * slope_out

        # data -> hidden neurons -> error = data -> output neurons -> change of error DOT output neurons -> output inputs (equal to hidden neurons) -> weights
        error_hidden = d_out.dot(wout.T)

        # data -> hidden neurons -> slope
        slope_h = derivatives_sigmoid(ah)

        # data -> hidden neurons -> change of error
        d_hidden = error_hidden * slope_h

        # hidden neurons -> output neurons -> weights = "" + hidden neurons -> data -> activations DOT data -> output neurons -> change of error
        wout = wout + ah.T.dot(d_out) * lr
        bout = bout + np.sum(d_out, axis=0, keepdims=True) * lr

        wh = wh + X.T.dot(d_hidden) * lr
        bh = bh + np.sum(d_hidden, axis=0, keepdims=True) * lr
    # testing results
    X = np.array(data[1][0][0:10])
    zh1 = np.dot(X, wh)
    zh = zh1 + bh

    # data -> hidden neurons -> activations
    ah = sigmoid(zh)

    zo1 = np.dot(ah, wout)
    zo = zo1 + bout

    output = sigmoid(zo)
    print([out2(y) for y in output])
    print(data[1][1][0:10])

总体而言,神经网络的输出对于每个输入都是相同的,并且使用不同的批量大小训练它,学习率和100个时期没有帮助。

1 个答案:

答案 0 :(得分:2)

XOR和MNIST问题之间的区别在于类的数量:XOR是二进制分类,在MNIST中有10个类。

您计算的错误E适用于XOR,因为sigmoid函数可用于二进制情况。如果有两个以上的类,则必须使用softmax function,这是sigmoid的扩展版本,cross entropy loss。看看this question,看看差异。您已将y正确翻译为单热编码,但output不包含预测的概率分布,实际上包含10个值的向量,每个值非常接近1.0。这就是网络不学习的原因。