我试图教我的多层神经网络XOR功能。我有一个架构网络[2,2,1]。我将损失定义为平方误差的总和(我知道它不理想,但我需要它)。如果我将所有图层的激活函数设置为sigmoid函数,我总是陷入局部最优(大约0.25,所有输出都在0.5左右)。如果我将隐藏层的激活功能更改为ReLU,我有时会陷入相同的最佳状态,但有时我会解决它。这可能是因为我使用均方误差而不是交叉熵吗?以防万一,这是我的神经网络代码:
import tensorflow as tf
def weight_variable(shape):
initial = tf.truncated_normal(shape, stddev=0.5)
return tf.Variable(initial)
def bias_variable(shape):
initial = tf.constant(0.1, shape=shape)
return tf.Variable(initial)
class FCLayer():
def __init__(self, inputs, outputs, activation):
self.W = weight_variable([inputs, outputs])
self.b = bias_variable([outputs])
self.activation = activation
def forward(self, X):
s = tf.matmul(X, self.W) + self.b
return self.activation(s)
class Network:
def __init__(self, architecture, activations=None):
self.layers = []
for i in range(len(architecture)-1):
self.layers.append(FCLayer(architecture[i], architecture[i+1],
tf.nn.sigmoid if activations==None else activations[i]))
self.x = tf.placeholder(tf.float32, shape=[None, architecture[0]])
self.out = self.x
for l in self.layers:
self.out = l.forward(self.out)
self.session = tf.Session();
self.session.run(tf.initialize_all_variables())
def train(self, X, Y_, lr, niter):
y = tf.placeholder(tf.float32, shape=[None, Y_.shape[1]])
loss = tf.reduce_mean((self.out - y)**2)
#loss = tf.reduce_sum(tf.nn.sigmoid_cross_entropy_with_logits(self.out, y))
train_step = tf.train.GradientDescentOptimizer(lr).minimize(loss)
errs = [];
for i in range(niter):
train_step.run(feed_dict={self.x: X, y: Y_},session=self.session)
errs.append(loss.eval(feed_dict={self.x: X, y: Y_},session=self.session))
return errs;
def predict(self, X):
return self.out.eval(feed_dict={self.x: X}, session = self.session)
更新:我尝试了更复杂的架构([2,2,2,1]),但仍然没有成功。
答案 0 :(得分:0)
解决了这个问题,由于某种原因,0.1的学习率太小了。我要说这个问题已经解决了,我只需要提高学习率。