我的任务是创建一个深度学习模型,该模型将在插入图像时返回具有某些特征的坐标值。特征点是视觉上的 _ | | 看起来像这样。 我已经通过考虑下面的线段与中心线段相交的点作为真值来实现它。
当我使用图像处理和计算机视觉算法来检测它时,它运作良好,我开始研究并尝试深度运行,但我在这里没有意义,并且它没有比我预期的更好。 / p>
近似的模型结构是输入图像是[140,240], 2X2最大池,0.5概率丢失, 在局部连接层中,将重量设定为3×3以形成[8,12,32,48]通道。在完全连接的层中,我们将其连接到62-> 2(输出)。 normal_random初始化,adam优化,0.001学习率。我们总共有3500个数据,只有10%用作测试集。
第一个问题是我得到坐标值的结果。 我通常使用0~1进行归一化。有什么问题吗?图像大小为[140,240]。
第二,一般来说,如果在培训期间错误是分歧,那么模型结构的主要问题是什么?我将纪元设置为30并将其打开,在十分之一处达到最小值0.3并且它会产生很多分歧......
谢谢。
import croping
import tensorflow as tf
tf.set_random_seed(777) # reproducibility
# hyper parameters
learning_rate = 0.001
training_epochs = 30
batch_size = 100
data = croping.getData()
class Model:
def __init__(self, sess, name):
self.sess = sess
self.name = name
self._build_net()
def _build_net(self):
with tf.variable_scope(self.name):
# dropout (keep_prob) rate 0.7~0.5 on training, but should be 1
# for testing
self.keep_prob = tf.placeholder(tf.float32)
# input place holders
self.X = tf.placeholder(tf.float32, [None, 143, 240]) # x,y 방향 확인
# img 28x28x1 (black/white)
X_img = tf.reshape(self.X, [-1, 143, 240, 1])
self.Y = tf.placeholder(tf.float32, [None, 2])
# L1 ImgIn shape=(?, 143, 240, 1)
W1 = tf.Variable(tf.random_normal([3, 3, 1, 8], stddev=0.01))
# Conv -> (?, 143, 240, 8)
# Pool -> (?, 72, 120, 8)
L1 = tf.nn.conv2d(X_img, W1, strides=[1, 1, 1, 1], padding='SAME')
L1 = tf.nn.relu(L1)
L1 = tf.nn.max_pool(L1, ksize=[1, 2, 2, 1],
strides=[1, 2, 2, 1], padding='SAME')
L1 = tf.nn.dropout(L1, keep_prob=self.keep_prob)
# L2 ImgIn shape=(?, 72, 120, 8)
W2 = tf.Variable(tf.random_normal([3, 3, 8, 12], stddev=0.01))
# Conv ->(?, 72, 120, 12)
# Pool ->(?, 36, 60, 12)
L2 = tf.nn.conv2d(L1, W2, strides=[1, 1, 1, 1], padding='SAME')
L2 = tf.nn.relu(L2)
L2 = tf.nn.max_pool(L2, ksize=[1, 2, 2, 1],
strides=[1, 2, 2, 1], padding='SAME')
L2 = tf.nn.dropout(L2, keep_prob=self.keep_prob)
# L3 ImgIn shape=(?, 36, 60, 12)
W3 = tf.Variable(tf.random_normal([3, 3, 12, 20], stddev=0.01))
# Conv ->(?, 36, 60, 20)
# Pool ->(?, 18, 30, 20)
L3 = tf.nn.conv2d(L2, W3, strides=[1, 1, 1, 1], padding='SAME')
L3 = tf.nn.relu(L3)
L3 = tf.nn.max_pool(L3, ksize=[1, 2, 2, 1], strides=[
1, 2, 2, 1], padding='SAME')
L3 = tf.nn.dropout(L3, keep_prob=self.keep_prob)
# L4 ImgIn shape=(?, 18, 30, 20)
W4 = tf.Variable(tf.random_normal([3, 3, 20, 32], stddev=0.01))
# Conv ->(?, 18, 30, 32)
# Pool ->(?, 9, 15, 32)
L4 = tf.nn.conv2d(L3, W4, strides=[1, 1, 1, 1], padding='SAME')
L4 = tf.nn.relu(L4)
L4 = tf.nn.max_pool(L4, ksize=[1, 2, 2, 1], strides=[
1, 2, 2, 1], padding='SAME')
L4 = tf.nn.dropout(L4, keep_prob=self.keep_prob)
# L5 ImgIn shape=(?, 9, 15, 32)
W5 = tf.Variable(tf.random_normal([3, 3, 32, 48], stddev=0.01))
# Conv ->(?, 9, 15, 48)
# Pool ->(?, 5, 8, 48)
L5 = tf.nn.conv2d(L4, W5, strides=[1, 1, 1, 1], padding='SAME')
L5 = tf.nn.relu(L5)
L5 = tf.nn.max_pool(L5, ksize=[1, 2, 2, 1], strides=[
1, 2, 2, 1], padding='SAME')
L5 = tf.nn.dropout(L5, keep_prob=self.keep_prob)
L5_flat = tf.reshape(L5, [-1, 5*8*48])
W6 = tf.get_variable("W6", shape=[5 * 8 * 48, 64],
initializer=tf.contrib.layers.xavier_initializer())
b6 = tf.Variable(tf.random_normal([64]))
L6 = tf.nn.relu(tf.matmul(L5_flat, W6) + b6)
L6 = tf.nn.dropout(L6, keep_prob=self.keep_prob)
W7 = tf.get_variable("W7", shape=[64, 2],
initializer=tf.contrib.layers.xavier_initializer())
b7 = tf.Variable(tf.random_normal([2]))
self.logits = tf.matmul(L6, W7) + b7
# define cost/loss & optimizer
self.cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(
logits=self.logits, labels=self.Y))
self.optimizer = tf.train.AdamOptimizer(
learning_rate=learning_rate).minimize(self.cost)
correct_prediction = tf.equal(
tf.argmax(self.logits, 1), tf.argmax(self.Y, 1))
self.accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
def predict(self, x_test, keep_prop=1.0):
return self.sess.run(self.logits, feed_dict={self.X: x_test, self.keep_prob: keep_prop})
def get_accuracy(self, x_test, y_test, keep_prop=1.0):
return self.sess.run(self.accuracy, feed_dict={self.X: x_test, self.Y: y_test, self.keep_prob: keep_prop})
def train(self, x_data, y_data, keep_prop=0.7):
return self.sess.run([self.cost, self.optimizer], feed_dict={
self.X: x_data, self.Y: y_data, self.keep_prob: keep_prop})
# initialize
sess = tf.Session()
m1 = Model(sess, "m1")
sess.run(tf.global_variables_initializer())
print('Learning Started!')
# train my model
for epoch in range(training_epochs):
avg_cost = 0
total_batch = int(data.num_train / batch_size)
for i in range(total_batch):
batch_xs, batch_ys = data.next_batch(batch_size)
c, _ = m1.train(batch_xs, batch_ys)
avg_cost += c / total_batch
print('Epoch:', '%04d' % (epoch + 1), 'cost =', '{:.9f}'.format(avg_cost))
print('Learning Finished!')
# Test model and check accuracy
print('Accuracy:', m1.get_accuracy(data.x_label, data.y_label))
答案 0 :(得分:1)
对于标准化的coordonates,这不是问题。
首先,我不会在必须推断出coordonates的网络中使用max pooling,因为max pooling会破坏大部分coordonates信息,而是使用strided convolutions。
其次,你使用softmax的交叉熵损失,它最好用于分类,但在这里你不做分类,而是回归,所以你应该使用更合适的损失,如平方均值误差。