TensorFlow中两个版本的cross_entropy计算之间的差异

时间:2017-01-25 11:26:37

标签: python tensorflow neural-network deep-learning entropy

我使用TensorFlow来训练二元分类神经网络。

为了建立网络,半年前我在TensorFlow网站上关注了教程 - Deep MNIST for Experts

今天,当我比较两个代码(教程中的代码和我写的代码)时,我可以看到交叉熵计算的差异。我不知道为什么会有这种差异。

教程中,交叉熵计算如下:

cross_entropy = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(y_conv, y_))

我的代码中,计算如下:

cross_entropy = tf.reduce_mean(-tf.reduce_sum(y_ * tf.log(y_conv), reduction_indices=[1]))

我是Tensorflow的新人,我觉得我错过了什么。 Mabey的区别在于两个版本的TensorFlow教程? 这两行之间的实际差异是什么?

非常感谢您的帮助。谢谢!

教程中的相关代码:

    y_conv = tf.matmul(h_fc1_drop, W_fc2) + b_fc2
...
    cross_entropy = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(y_conv, y_))
    train_step = tf.train.AdamOptimizer(1e-4).minimize(cross_entropy)
    correct_prediction = tf.equal(tf.argmax(y_conv,1), tf.argmax(y_,1))
    accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
    sess.run(tf.global_variables_initializer())

我的代码:

# load data
folds = build_database_tuple.load_data(data_home_dir=data_home_dir,validation_ratio=validation_ratio,patch_size=patch_size)

# starting the session. using the InteractiveSession we avoid build the entiee comp. graph before starting the session
sess = tf.InteractiveSession()

# start building the computational graph
# the 'None' indicates the number of classes - a value that we wanna leave open for now
x = tf.placeholder(tf.float32, shape=[None, patch_size**2]) #input images - 28x28=784
y_ = tf.placeholder(tf.float32, shape=[None, 2]) #output classes (using one-hot vectors)

# the vriables for the linear layer
W = tf.Variable(tf.zeros([(patch_size**2),2])) #weights - 784 input features and 10 outputs
b = tf.Variable(tf.zeros([2])) #biases - 10 classes

# initialize all the variables using the session, in order they could be used in it
sess.run(tf.initialize_all_variables())

# implementation of the regression model
y = tf.nn.softmax(tf.matmul(x,W) + b)

# Done!

# FIRST LAYER:
# build the first layer
W_conv1 = weight_variable([first_conv_kernel_size, first_conv_kernel_size, 1, first_conv_output_channels]) # 5x5 patch, 1 input channel, 32 output channels (features)
b_conv1 = bias_variable([first_conv_output_channels])

x_image = tf.reshape(x, [-1,patch_size,patch_size,1]) # reshape x to a 4d tensor. 2,3 are the image dimensions, 4 is ine color channel

# apply the layers
h_conv1 = tf.nn.relu(conv2d(x_image, W_conv1) + b_conv1)
h_pool1 = max_pool_2x2(h_conv1)

# SECOND LAYER:
# 64 features each 5x5 patch
W_conv2 = weight_variable([sec_conv_kernel_size, sec_conv_kernel_size, patch_size, sec_conv_output_channels])
b_conv2 = bias_variable([sec_conv_output_channels])

h_conv2 = tf.nn.relu(conv2d(h_pool1, W_conv2) + b_conv2)
h_pool2 = max_pool_2x2(h_conv2)

# FULLY CONNECTED LAYER:
# 1024 neurons, 8x8 - new size after 2 pooling layers
W_fc1 = weight_variable([(patch_size/4) * (patch_size/4) * sec_conv_output_channels, fc_vec_size])
b_fc1 = bias_variable([fc_vec_size])

h_pool2_flat = tf.reshape(h_pool2, [-1, (patch_size/4) * (patch_size/4) * sec_conv_output_channels])
h_fc1 = tf.nn.relu(tf.matmul(h_pool2_flat, W_fc1) + b_fc1)

# dropout layer - meant to reduce over-fitting
keep_prob = tf.placeholder(tf.float32)
h_fc1_drop = tf.nn.dropout(h_fc1, keep_prob)

# READOUT LAYER:
# softmax regression
W_fc2 = weight_variable([fc_vec_size, 2])
b_fc2 = bias_variable([2])

y_conv=tf.nn.softmax(tf.matmul(h_fc1_drop, W_fc2) + b_fc2)

# TRAIN AND EVALUATION:
cross_entropy = tf.reduce_mean(-tf.reduce_sum(y_ * tf.log(y_conv), reduction_indices=[1]))
train_step = tf.train.AdamOptimizer(1e-4).minimize(cross_entropy)
correct_prediction = tf.equal(tf.argmax(y_conv,1), tf.argmax(y_,1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
sess.run(tf.initialize_all_variables())

1 个答案:

答案 0 :(得分:3)

差异很小但很有意义。

softmax_cross_entropy_with_logits采用logits(没有任何范围限制的实数),将它们传递给softmax函数,然后计算交叉熵。将两者结合为一个函数是应用一些优化来提高数值精度。

第二个代码只是将交叉熵直接应用于y_conv,这似乎是softmax函数的输出。这是正确的,并且两者都应该给出相似但不相同的结果,softmax_cross_entropy_with_logits由于数值稳定性而优越。只记得给它logits而不是softmax的输出。