我正在尝试制作3层神经网络来对数字进行分类。我正在使用Mnist数据集。问题是损失函数冻结在1.000047619047619
我已经运行了7000次以上的程序。我最初使用40个隐藏单位,但尝试了100个和300个却没有成功。我的代码能够轻松解决XOR问题。
这是我创建NN并对其进行训练的代码
data = pd.read_csv("train.csv").values
xtrain = data[0:21000, 1:]
ytrain = data[0:21000, 0]
trainy = invec(ytrain).reshape(10, 21000).T
textrec = NN(xtrain, trainy, 21000, 1, [784, 300, 10], np.array([relu, softmax]), np.array([drelu]))
textrec.train()
NN类有7个输入:输入(在这种情况下为像素)X,输出y,训练示例的数量n,学习率a,层的大小Lsize激活函数作用及其导数dact
这是正向传播的代码
def L1(self, x):
return self.act[0](np.dot(x, self.w0_1) + self.b1)
def L2(self, x):
return self.act[1](np.dot(x, self.w1_2) + self.b2)
def h(self, x):
return self.L2(self.L1(x))
这是反向传播的代码
def train(self):
t = 0
while self.dw0_1.all() != 0 or self.dw1_2.all != 0 or self.db1.all != 0 or self.db2.all != 0:
X = self.X
L2 = self.h(X)
y = self.y
L1 = self.L1(X)
self.dz2 = (L2 - y)
self.dw1_2 = np.matmul(self.dz2.T, L1)
self.db2 = np.sum(self.dz2, axis = 1, keepdims = True)
self.dz1 = np.matmul(self.w1_2, self.dz2.T) * self.dact[1](L1).T
self.dw0_1 = np.matmul(self.dz1, X).T
self.db1 = np.sum(self.dz1, axis = 1, keepdims = True).T
self.dw0_1 = self.dw0_1.sum(axis = 1, keepdims = True)/self.n * self.a
self.dw1_2 = self.dw1_2.sum(axis = 1, keepdims = True)/self.n * self.a
self.db2 = self.db2/self.n * self.a
self.db1 = self.db1/self.n * self.a
self.w0_1 = self.w0_1 - self.dw0_1
self.w1_2 = self.w1_2 - self.dw1_2.T
self.b2 -= self.db2
self.b1 -= self.db1
#print(str([self.w0_1, self.w1_2, self.b1, self.b2]))
#print(str([self.dw0_1, self.dw1_2, self.db1, self.db2]))
print("Cost: " + str(((L2 - y) ** 2).sum()/self.n))
t+=1
print("finished running: " + str(t) + "times")
我每次都得到这个:
Cost: 1.000047619047619
Cost: 0.9999523809523809
Cost: 1.000047619047619
Cost: 1.000047619047619
Cost: 1.000047619047619
Cost: 1.000047619047619
Cost: 1.000047619047619
Cost: 1.000047619047619
它以0.1到100的学习率输出相同的内容