我使用vgg.h5模型+ Keras(GPU上的Tensorflow后端)进行实时对象分类。它运作良好。
然后我尝试使用权重来自vgg.h5的纯张量流图:
那么,也许任何人都有在tensorflow中从头开始构建vgg16的经验并且可以提供帮助吗?为什么作为Keras后端的张量流很好,但纯粹的张量流(具有相同的权重)无法计算完整的连接输出?在Keras中是否有任何额外的优化来实现完整连接(密集)层?
答案 0 :(得分:1)
以下是您的代码的测试变体,其中包含在多个位置打印张量形状的文章:
import tensorflow as tf
import numpy as np
with tf.Session() as sess:
# mock the previous layer's output with a placeholder
pool5_input = tf.placeholder(dtype = tf.float32, shape = (None,7,7,512))
# insert a print operation to print the shape
pool5 = tf.Print(pool5_input, [ tf.shape(pool5_input) ], "pool5 shape is ", summarize = 4)
layer_name = 'fc1'
wd = tf.Variable(np.ones((25088, 4096), dtype='float32'), trainable=False, name=layer_name+'_wd')
bd = tf.Variable(np.ones((4096,), dtype='float32'), trainable=False, name=layer_name+'_bd')
layer_shape = [-1, wd.get_shape().as_list()[0]]
print('layer_shape:', layer_shape)
fc1_flat = tf.reshape(pool5, shape=layer_shape)
fc1_flat = tf.Print(fc1_flat, [ tf.shape(fc1_flat) ], "fc1_flat shape is ")
fc1 = tf.nn.relu( tf.nn.bias_add( tf.matmul(fc1_flat, wd, name=layer_name), bd ) )
fc1 = tf.Print(fc1, [ tf.shape(fc1) ], "fc1 shape is ")
import time
sess.run(tf.global_variables_initializer())
# evaluate network for in input of (minibatch_size, 7, 7, 512)
minibatch_size = 32
start = time.time()
output = sess.run(fc1, feed_dict = { pool5_input: np.ones((minibatch_size, 7, 7, 512), dtype = 'float32')})
elapsed = time.time() - start
print("time to evaluate fully connected layer for minibatch size %d: %.3f seconds" % (minibatch_size, elapsed))
print("output shape is",output.shape)
我得到以下输出:
layer_shape: [-1, 25088]
...: I tensorflow/core/kernels/logging_ops.cc:79] pool5 shape is [32 7 7 512]
...: I tensorflow/core/kernels/logging_ops.cc:79] fc1_flat shape is [32 25088]
...: I tensorflow/core/kernels/logging_ops.cc:79] fc1 shape is [32 4096]
time to evaluate fully connected layer for minibatch size 32: 0.329 seconds
output shape is (32, 4096)
所以对我来说,对于32的小批量大小,它需要不到一秒的时间(在GPU上)。
您可以在代码中插入类似的tf.Print()
语句,并验证您是否具有相同(或类似)的维度。通过乘以维度的大小,您可以看到每个阶段使用了多少内存。