我正在为分类问题构建MLP。输出类的数量为8,输入数据由81列二进制特征和总共2416个数据点组成。
到目前为止,我的NN是3个隐藏层,分别有400,300和200个节点,学习率设置为0.0001。
对于培训,我想使用以下代码实现批处理:
number_of_examples = X_train.shape[0]
batch_size = 60
n_epochs = 10000
number_of_batches = X_train.shape[0] // batch_size
with tf.Session() as sess:
init.run()
for epoch in range(n_epochs):
for i in range(number_of_batches):
batch_start = (i - epoch) * batch_size % number_of_examples
batch_stop = (i - epoch + 1) * batch_size % number_of_examples
if (batch_stop < batch_start):
batch_stop = number_of_examples
X_batch = X_train[batch_start:number_of_examples, :]
y_batch = y_train[batch_start:number_of_examples]
data = np.column_stack((X_train, y_train))
np.random.shuffle(data)
X_train = data[:, :81]
y_train = data[:, 81]
else:
X_batch = X_train[batch_start:batch_stop, :]
y_batch = y_train[batch_start:batch_stop]
sess.run(training_op, feed_dict = {X: X_train, y: y_train})
accuracy_val, loss_val, accuracy_summary_str, loss_summary_str = sess.run([accuracy, loss, accuracy_summary, loss_summary], feed_dict={X: X_batch y: y_batch})
acc_test = accuracy.eval(feed_dict = {X: X_test, y: y_test})
当我执行代码时,我看到列车的准确性在上升和下降。起初我认为如果它太大而不能使梯度下降收敛到最小值,我会考虑降低学习率。精度保持上升和下降,而且学习速度非常慢(经过200次迭代后,精度在30%到35%之间)。
然后我删除了所有批次的实现并使用了整个数据集,希望我可以隔离问题,但问题仍然存在。 没有批次的代码是:
number_of_examples = X_train.shape[0]
batch_size = 60
n_epochs = 10000
number_of_batches = X_train.shape[0] // batch_size
with tf.Session() as sess:
init.run()
for epoch in range(n_epochs):
sess.run(training_op, feed_dict = {X: X_train, y: y_train})
accuracy_val, loss_val, accuracy_summary_str, loss_summary_str = sess.run([accuracy, loss, accuracy_summary, loss_summary], feed_dict={X: X_train, y: y_train})
acc_test = accuracy.eval(feed_dict = {X: X_test, y: y_test})
此时我对于出了什么问题感到很困惑。在这一点上困扰我的不是表现不佳,而是准确性的不稳定行为。 为了完成,我使用ReLu作为隐藏层上的激活功能和输出上的softmax。损失函数是交叉熵。