Nan总结直方图:模型/训练/密集/内核/梯度

时间:2018-02-18 14:44:27

标签: python tensorflow tensorboard

我正在构建一个包含45个特征的数据集,所有数字和规范化的值使得值介于-1和1之间。

以下是规范化:

//The array containing artists names
char artists[4][80];
//The array containing the sorted artists
char sortedArtists[4][80];
//Songs for Artist 1
char songsArtist1[3][80];
//Songs for Artist 2
char songsArtist2[3][80];
//Songs for Artist 3
char songsArtist3[3][80];
//Songs for Artist 4
char songsArtist4[3][80];
//The total number of artists (Note it can be less than 4)
int numOfArtists = 0;
//The total number of songs for each artist (Note that less than 3 songs can be provided for each artist)
int numSongsPerArtist[4] = {0,0,0,0};

然后我构建tensorflow数据集和迭代器并将其传递给我的模型。 这是模型:

def normalize(train, test, cv):
    normalized_train=(train-train.mean())/train.std()
    normalized_test=(test-test.mean())/test.std()
    normalized_cv=(cv-cv.mean())/cv.std()
    return normalized_train, normalized_test, normalized_cv
X_train, X_test, X_cv = normalize(X_train, X_test, X_cv)

最后我的损失函数,我的优化器和渐变计算应用了:

with tf.name_scope('model'):
regularizer = tf.contrib.layers.l2_regularizer(scale=0.1)
net = tf.layers.dense(features, 40, activation=tf.nn.relu, kernel_regularizer=regularizer, 
                        kernel_initializer=tf.contrib.layers.xavier_initializer())                        
net = tf.layers.dense(net, 60, activation=tf.nn.relu, kernel_regularizer=regularizer,
                        kernel_initializer=tf.contrib.layers.xavier_initializer())
net = tf.layers.dense(net, 30, activation=tf.nn.relu, kernel_regularizer=regularizer,
                        kernel_initializer=tf.contrib.layers.xavier_initializer())
net = tf.layers.dense(net, 12, activation=tf.nn.relu, kernel_regularizer=regularizer,
                        kernel_initializer=tf.contrib.layers.xavier_initializer())
prediction = tf.layers.dense(net, 2, activation=tf.nn.sigmoid)

当我运行时,我收到以下错误:

with tf.name_scope('Loss'):
    loss = tf.losses.softmax_cross_entropy(onehot_labels=labels, logits=prediction) 
    tf.summary.scalar('Loss', loss)
with tf.name_scope('Training'):
    opt = tf.train.AdamOptimizer(learning_rate = learning_rate)
    grads = opt.compute_gradients(loss)
    for grad, var in grads:
        if grad is not None:
            tf.summary.histogram(var.op.name + '/gradients', grad)
    train_op = opt.apply_gradients(grads)

问题是Caused by op 'model/Training/dense/kernel/gradients', defined at: File "c:\Users\123456\Google Drive\Projects\GIT\Churn_TF\churn_1.2_local_dataset.py", line 103, in <module> tf.summary.histogram(var.op.name + '/gradients', grad) File "C:\Users\123456\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow\python\summary\summary.py", line 193, in histogram tag=tag, values=values, name=scope) File "C:\Users\123456\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow\python\ops\gen_logging_ops.py", line 215, in _histogram_summary "HistogramSummary", tag=tag, values=values, name=name) File "C:\Users\123456\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow\python\framework\op_def_library.py", line 787, in _apply_op_helper op_def=op_def) File "C:\Users\123456\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow\python\framework\ops.py", line 3160, in create_op op_def=op_def) File "C:\Users\123456\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow\python\framework\ops.py", line 1625, in __init__ self._traceback = self._graph._extract_stack() # pylint: disable=protected-access InvalidArgumentError (see above for traceback): Nan in summary histogram for: model/Training/dense/kernel/gradients [[Node: model/Training/dense/kernel/gradients = HistogramSummary[T=DT_DOUBLE, _device="/job:localhost/replica:0/task:0/device:CPU:0"](model/Training/dense/kernel/gradients/tag, model/Training/gradients/model/dense/MatMul_grad/tuple/control_dependency_1/_101)]]

从我读到的内容可能是一个爆炸性的渐变问题,但我怎么能调试这个,因为创建这些直方图的目的是看看我的渐变会发生什么?

此外,由于我的规范化和正规化,我很惊讶得到这个....可能是我的渐变变得太小了?

我已尝试将Nan in summary histogram for: model/Training/dense/kernel/gradients替换为tf.nn.relu,但后来我得到了一个浮动64到float32转换错误,我无法解决这个问题....有什么想法可以帮我解决这个问题吗?< / p>

1 个答案:

答案 0 :(得分:1)

从您的代码中

prediction = tf.layers.dense(net, 2, activation=tf.nn.sigmoid)

我推断你有两类分类问题,输出层的激活函数是sigmoid函数。

但是,作为您的损失功能,您正在使用

tf.losses.softmax_cross_entropy

功能。首先,我建议使用

tf.losses.sigmoid_cross_entropy

功能。请注意,此函数(以及tf.losses.softmax_cross_entropy函数)期望(未缩放的)logits作为输入。因此,在您的情况下,最后一层的结果应用sigmoid非线性。因此,我建议更改以下行

prediction = tf.layers.dense(net, 2, activation=tf.nn.sigmoid)

logits = tf.layers.dense(net, 2)
prediction = tf.nn.sigmoid(logits) # this line is only needed if you want to use predictions somewhere else

然后

loss = tf.losses.sigmoid_cross_entropy(onehot_labels=labels, logits=logits)

也许这已经解决了你的问题。如果没有,您使用的是什么学习率?如果学习率太大,我通常会收到此错误。