有一个简单而有教育意义的玩具分类器(2个完全连接的层)作为JAVA applet:http://cs.stanford.edu/people/karpathy/convnetjs/demo/classify2d.html
此处,输入是带有{0,1}标签的2D点列表。正如您在那里看到的,他们定义了如下的架构。
layer_defs = [];
layer_defs.push({type:'input', out_sx:1, out_sy:1, out_depth:2});
layer_defs.push({type:'fc', num_neurons:6, activation: 'tanh'});
layer_defs.push({type:'fc', num_neurons:2, activation: 'tanh'});
layer_defs.push({type:'softmax', num_classes:2});
我正在尝试使用tensorflow进行测试,如下所示。
pts = tf.placeholder(tf.float32, [None,2], name="p")
label = tf.placeholder(tf.int32, [None], name="labels")
with tf.variable_scope("layers") as scope:
fc1 = fc_layer(pts, [2, 6], "fc1")
fc1 = tf.nn.tanh(fc1)
fc2 = fc_layer(fc1, [6, 2], "fc2")
fc2 = tf.nn.tanh(fc2)
cross_entropy = tf.nn.sparse_softmax_cross_entropy_with_logits(fc2, label, name='cross_entropy_per_example')
cross_entropy_mean = tf.reduce_mean(cross_entropy, name='cross_entropy')
optimizer = tf.train.MomentumOptimizer(learning_rate, 0.9)
train_op = optimizer.minimize(cross_entropy_mean, global_step=global_step)
并且函数fc_layer
只是
def fc_layer(bottom, weight_shape, name):
W = tf.get_variable(name+'W', shape=weight_shape, dtype=tf.float32, initializer=tf.random_normal_initializer(mean = 0.01,stddev=0.01))
b = tf.get_variable(name+'b', shape=[weight_shape[1]], dtype=tf.float32, initializer=tf.random_normal_initializer(mean = 0.01,stddev=0.01))
fc = tf.nn.bias_add(tf.matmul(bottom, W), b)
return fc
然而,损失似乎没有减少。丢失定义(交叉熵)有问题吗?
任何人都可以提供帮助吗?
答案 0 :(得分:2)
仔细观察之后,在我看来,损失定义没有问题。
我发现一些参数定义与原始ConvNetJS demo不同。但是,选择相同的参数并不会改变行为。
然后我意识到ConvNetJS页面没有解释如何初始化权重(在快速搜索后无法在源代码中找到,并且此处的代码示例隐藏在文本区域中:-P)。这是一个真正改变行为的问题。
影响结果的另一个参数是批量大小。
生成第二个图像的代码(将权重替换为原始值以获得第一个图像),学习识别两个输入数字具有相同符号的时间:
import tensorflow as tf
import random
# Training data
points = [[random.uniform(-1, 1), random.uniform(-1, 1)] for _ in range(1000000)]
labels = [1 if x * y > 0.0 else 0 for (x, y) in points]
batch_size = 100 # a divider of len(points) to keep things simple
momentum = 0.9
global_step=tf.Variable(0, trainable=False)
learning_rate = tf.train.exponential_decay(0.01, global_step, 10, 0.99, staircase=True)
###
### The original code, where `momentum` is now a variable,
### and the weights are initialized differently.
###
def fc_layer(bottom, weight_shape, name):
W = tf.get_variable(name+'W', shape=weight_shape, dtype=tf.float32, initializer=tf.random_normal_initializer(mean=0., stddev=(1/weight_shape[0])))
b = tf.get_variable(name+'b', shape=[weight_shape[1]], dtype=tf.float32, initializer=tf.random_normal_initializer(mean=0., stddev=(1/weight_shape[0])))
fc = tf.nn.bias_add(tf.matmul(bottom, W), b)
return fc
pts = tf.placeholder(tf.float32, [None,2], name="p")
label = tf.placeholder(tf.int32, [None], name="labels")
with tf.variable_scope("layers") as scope:
fc1 = fc_layer(pts, [2, 6], "fc1")
fc1 = tf.nn.tanh(fc1)
fc2 = fc_layer(fc1, [6, 2], "fc2")
fc2 = tf.nn.tanh(fc2)
cross_entropy = tf.nn.sparse_softmax_cross_entropy_with_logits(fc2, label, name='cross_entropy_per_example')
cross_entropy_mean = tf.reduce_mean(cross_entropy, name='cross_entropy')
optimizer = tf.train.MomentumOptimizer(learning_rate, momentum)
train_op = optimizer.minimize(cross_entropy_mean, global_step=global_step)
###
ce_summary = tf.scalar_summary('ce', cross_entropy_mean)
with tf.Session() as session:
all_summaries = tf.merge_all_summaries()
summarizer = tf.train.SummaryWriter('./log', session.graph)
tf.initialize_all_variables().run()
for i in range(len(points) // batch_size):
_, ce, cs = session.run([
train_op,
cross_entropy_mean,
ce_summary
],
{
pts: points[i:(i + batch_size)],
label: labels[i:(i + batch_size)]
})
summarizer.add_summary(cs, global_step=tf.train.global_step(session, global_step))
print(ce)
网络似乎不是最好的,但交叉熵确实会减少!