在n个时期之后,具有gredient下降optamization的成本函数值增加

时间:2017-05-14 23:56:06

标签: python tensorflow neural-network gradient-descent multilabel-classification

我正致力于为多标签文本文档分类创建神经网络。 我在Vector(V)中存储了3750个单词的词汇。

对于每个输入文档,我创建一个大小为3750的向量(I)。如果在向量(V)中x索引处的词汇表中找到输入文档中的术语,则向量中的x索引标记为1,否则为0例子。 [1,1,0,0,0,1,......,0]

对于标签,我有一个存储在Vector(L)中的1500个标签的词汇表。 如上所述,我为每个文档创建了一个向量(LB),如果文档有标签x,则将第i个索引标记为1。

标签数据也表示为具有1550个元素的向量,如[0,0,1,0,1,....,0]。第i个元素指示第i个标签是否是文本的肯定标签。文本的标签数量因文本而异。

这是我的代码。

from __future__ import division
import tensorflow as tf
import numpy as np
import time

def csv_to_numpy_array(filePath, delimiter):
    return np.genfromtxt(filePath, delimiter=delimiter, dtype=None)


def import_data():
    print("Load training data")
    trainX = csv_to_numpy_array("/home/shahzeb/temp/train_data/trainX.csv", delimiter=",")
    trainY = csv_to_numpy_array("/home/shahzeb/temp/train_data/trainY.csv", delimiter=",")
    return trainX, trainY


startTime = time.time()
trainX, trainY = import_data()

learning_rate = 0.001
training_epochs = 500

# Network Parameters
n_hidden_1 = 3560 # 1st layer number of features
n_hidden_2 = 3560 # 2nd layer number of features
n_input = trainX.shape[1]
n_classes = trainY.shape[1]

# tf Graph input
input_neurons = tf.placeholder("float", [None, n_input],name="input")
known_outputs = tf.placeholder("float", [None, n_classes],name="labels")


def model(x):

    with tf.name_scope("Relu_activation"):
        # Hidden layer with RELU activation
        w1 = tf.Variable(tf.random_normal([n_input, n_hidden_1]), name="w")
        b1 = tf.Variable(tf.random_normal([n_hidden_1]), name="b")
        layer_1 = tf.add(tf.matmul(x, w1), b1)
        layer_1 = tf.nn.relu(layer_1)
        # Hidden layer with sigmoid activation
    with tf.name_scope("Sigmoid"):
        w2 = tf.Variable(tf.random_normal([n_hidden_1, n_hidden_2]), name="w")
        b2 = tf.Variable(tf.random_normal([n_hidden_2]), name="b")
        layer_2 = tf.add(tf.matmul(layer_1, w2), b2)
        layer_2 = tf.nn.sigmoid(layer_2)
        # Output layer with linear activation
    with tf.name_scope("output"):
        w3 = tf.Variable(tf.random_normal([n_hidden_2, n_classes]), name="w")
        b3 = tf.Variable(tf.random_normal([n_classes]), name="b")
        out_layer = tf.matmul(layer_2, w3) + b3
        return out_layer,w1,w2,w3

model_output_OP, w_1,w_2,w_3 = model(input_neurons)

with tf.name_scope("cost"):
    cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=model_output_OP, labels=known_outputs))

with tf.name_scope("train"):
    optimizer = tf.train.GradientDescentOptimizer(learning_rate).minimize(cost)

with tf.name_scope("accuracy"):
    correct_predictions_OP = tf.equal(tf.argmax(model_output_OP, 1), tf.argmax(known_outputs, 1))
    accuracy_OP = tf.reduce_mean(tf.cast(correct_predictions_OP, "float"), name="Accuracy_op")

with tf.name_scope("summary"):
    model_output_OP_summary = tf.summary.histogram("output", model_output_OP)
    accuracy_OP_summary = tf.summary.scalar("accuracy",accuracy_OP)
    cost_summary = tf.summary.scalar("cost",cost)
    summary_op = tf.summary.merge_all()
    writer = tf.summary.FileWriter("/home/shahzeb/temp/summarylogs/", graph=tf.get_default_graph())

init = tf.global_variables_initializer()

with tf.Session() as sess:
    sess.run(init)
    # Training cycle
    for epoch in range(training_epochs):
        _, c, summary,train_accuracy,iw1,iw2,iw3 = sess.run([optimizer, cost,summary_op,accuracy_OP,
                                                 w_1,w_2,w_3
                                                 ],
                                                feed_dict={input_neurons: trainX, known_outputs: trainY})

        print("Epoch:", '%04d' % (epoch + 1), "cost=", "{:.9f}".format(c), "Accuracy =",train_accuracy)
        #np.set_printoptions(threshold=np.nan)
        #print(iw1)
        #print(iw2)
        #print(iw3)
        #rint("--------------")
        writer.add_summary(summary, epoch + 1)

    saver = tf.train.Saver()
    saver.save(sess, "/home/shahzeb/temp/trained_model/hidden_layer_nn.ckpt")
print("Done")

和张量板上的图表如下。 enter image description here

为什么在一定数量的纪元之后成本函数值会增加。有什么问题,我该如何解决。

1 个答案:

答案 0 :(得分:0)

问题的关键是时代的数量。对于给定的训练数据集,您可能使用的太多了。所以可能的解释是你在150左右的某个时刻开始过度适应。

如何处理此类问题有一个很好的discussion on forums.fast.ai。一个简单的解决方案是实现一个早期停止"机制,可以使用Tensorflow的验证监视器as described in this tutorial in the Tensorflow documentation完成。