用于MNIST的三层神经网络

时间:2015-09-21 07:56:27

标签: python neural-network

我正在编写自己的代码来实现单隐藏层神经网络,并在MNIST数据集上测试模型。但是我得到了有线结果(NLL高得无法接受)虽然我检查了我的代码超过2天而没有发现什么问题。

这是全局参数:

layers = np.array([784, 300, 10])
learningRate = 0.01
momentum = 0.01
batch_size = 10000
num_of_batch = len(train_label)/batch_size
nepoch = 30

Softmax函数定义:

def softmax(x):
    x = np.exp(x)
    x_sum = np.sum(x,axis=1) #shape = (nsamples,)
    for row_idx in range(len(x)):
        x[row_idx,:] /= x_sum[row_idx]
    return x

Sigmoid函数定义:

def f(x):
    return 1.0/(1+np.exp(-x))

初始化w和b

k = np.vectorize(math.sqrt)(layers[0:-2]*layers[1:])
w1 = np.random.uniform(-0.5, 0.5, layers[0:2][::-1])
b1 = np.random.uniform(-0.5, 0.5, (1,layers[1]))
w2 = np.random.uniform(-0.5, 0.5, layers[1:3][::-1])
b2 = np.random.uniform(-0.5, 0.5, (1,layers[2]))

以下是每个小批量的核心部分:

for idx in range(num_of_batch):

    # forward_vectorized
    x = train_set[idx*batch_size:(idx+1)*batch_size,:]
    y = Y[idx*batch_size:(idx+1)*batch_size,:]

    a1 = x
    a2 = f(np.dot(np.insert(a1,0,1,axis=1),np.insert(w1,0,b1,axis=1).T))
    a3  = softmax(np.dot(np.insert(a2,0,1,axis=1),np.insert(w2,0,b2,axis=1).T))

    # compute delta
    d3 = a3-y
    d2 = np.dot(d3,w2)*a2*(1.0-a2)

    # compute grad
    D2 = np.dot(d3.T,a2)
    D1 = np.dot(d2.T,a1)

    # update_parameters
    w1 = w1 - learningRate*(D1/batch_size + momentum*w1)
    b1 = b1 - learningRate*(np.sum(d2,axis=0)/batch_size)
    w2 = w2 - learningRate*(D2/batch_size+ momentum*w2)
    b2 = b2 - learningRate*(np.sum(d3,axis=0)/batch_size)

    e = -np.sum(y*np.log(a3))/batch_size
    err.append(e)

在一个时期(50,000个样本)之后,我得到了以下e序列,这似乎太大了:

Out[1]:
    10000/50000     4.033538
    20000/50000     3.924567
    30000/50000     3.761105
    40000/50000     3.632708
    50000/50000     3.549212

我认为back_prop代码应该是正确的,我找不到出错的地方。它折磨了我超过2天。

0 个答案:

没有答案