我正在编写自己的代码来实现单隐藏层神经网络,并在MNIST数据集上测试模型。但是我得到了有线结果(NLL高得无法接受)虽然我检查了我的代码超过2天而没有发现什么问题。
这是全局参数:
layers = np.array([784, 300, 10])
learningRate = 0.01
momentum = 0.01
batch_size = 10000
num_of_batch = len(train_label)/batch_size
nepoch = 30
Softmax函数定义:
def softmax(x):
x = np.exp(x)
x_sum = np.sum(x,axis=1) #shape = (nsamples,)
for row_idx in range(len(x)):
x[row_idx,:] /= x_sum[row_idx]
return x
Sigmoid函数定义:
def f(x):
return 1.0/(1+np.exp(-x))
初始化w和b
k = np.vectorize(math.sqrt)(layers[0:-2]*layers[1:])
w1 = np.random.uniform(-0.5, 0.5, layers[0:2][::-1])
b1 = np.random.uniform(-0.5, 0.5, (1,layers[1]))
w2 = np.random.uniform(-0.5, 0.5, layers[1:3][::-1])
b2 = np.random.uniform(-0.5, 0.5, (1,layers[2]))
以下是每个小批量的核心部分:
for idx in range(num_of_batch):
# forward_vectorized
x = train_set[idx*batch_size:(idx+1)*batch_size,:]
y = Y[idx*batch_size:(idx+1)*batch_size,:]
a1 = x
a2 = f(np.dot(np.insert(a1,0,1,axis=1),np.insert(w1,0,b1,axis=1).T))
a3 = softmax(np.dot(np.insert(a2,0,1,axis=1),np.insert(w2,0,b2,axis=1).T))
# compute delta
d3 = a3-y
d2 = np.dot(d3,w2)*a2*(1.0-a2)
# compute grad
D2 = np.dot(d3.T,a2)
D1 = np.dot(d2.T,a1)
# update_parameters
w1 = w1 - learningRate*(D1/batch_size + momentum*w1)
b1 = b1 - learningRate*(np.sum(d2,axis=0)/batch_size)
w2 = w2 - learningRate*(D2/batch_size+ momentum*w2)
b2 = b2 - learningRate*(np.sum(d3,axis=0)/batch_size)
e = -np.sum(y*np.log(a3))/batch_size
err.append(e)
在一个时期(50,000个样本)之后,我得到了以下e序列,这似乎太大了:
Out[1]:
10000/50000 4.033538
20000/50000 3.924567
30000/50000 3.761105
40000/50000 3.632708
50000/50000 3.549212
我认为back_prop代码应该是正确的,我找不到出错的地方。它折磨了我超过2天。