Tensorflow vgg16预测显着减慢

时间:2018-01-21 15:05:48

标签: tensorflow keras

我使用vgg.h5模型+ Keras(GPU上的Tensorflow后端)进行实时对象分类。它运作良好。

然后我尝试使用权重来自vgg.h5的纯张量流图:

  1. 我用h5py解析了vgg.h5并以numpay.array格式接收了所有图层的权重
  2. 我已经构建了一个图表(我在tf.Variable中存储内核和偏差)
  3. 但我无法接收输出预测向量。经过调查,我发现所有卷积层都有效,但vgg16中的第一个完整连接层输出(fc1与25088 x 4096权重矩阵)计算约5分钟。它不适合实时分类。
  4. 那么,也许任何人都有在tensorflow中从头开始构建vgg16的经验并且可以提供帮助吗?为什么作为Keras后端的张量流很好,但纯粹的张量流(具有相同的权重)无法计算完整的连接输出?在Keras中是否有任何额外的优化来实现完整连接(密集)层?

1 个答案:

答案 0 :(得分:1)

以下是您的代码的测试变体,其中包含在多个位置打印张量形状的文章:

import tensorflow as tf
import numpy as np

with tf.Session() as sess:

    # mock the previous layer's output with a placeholder
    pool5_input = tf.placeholder(dtype = tf.float32, shape = (None,7,7,512))

    # insert a print operation to print the shape
    pool5 = tf.Print(pool5_input, [ tf.shape(pool5_input) ], "pool5 shape is ", summarize = 4)

    layer_name = 'fc1'
    wd = tf.Variable(np.ones((25088, 4096), dtype='float32'), trainable=False, name=layer_name+'_wd')
    bd = tf.Variable(np.ones((4096,), dtype='float32'), trainable=False, name=layer_name+'_bd')
    layer_shape = [-1, wd.get_shape().as_list()[0]]
    print('layer_shape:', layer_shape)

    fc1_flat = tf.reshape(pool5, shape=layer_shape)
    fc1_flat = tf.Print(fc1_flat, [ tf.shape(fc1_flat) ], "fc1_flat shape is ")

    fc1 = tf.nn.relu( tf.nn.bias_add( tf.matmul(fc1_flat, wd, name=layer_name), bd ) )
    fc1 = tf.Print(fc1, [ tf.shape(fc1) ], "fc1 shape is ")

    import time
    sess.run(tf.global_variables_initializer())

    # evaluate network for in input of (minibatch_size, 7, 7, 512)
    minibatch_size = 32

    start = time.time()
    output = sess.run(fc1, feed_dict = { pool5_input: np.ones((minibatch_size, 7, 7, 512), dtype = 'float32')})

    elapsed = time.time() - start
    print("time to evaluate fully connected layer for minibatch size %d: %.3f seconds" % (minibatch_size, elapsed))
    print("output shape is",output.shape)

我得到以下输出:

layer_shape: [-1, 25088]
...: I tensorflow/core/kernels/logging_ops.cc:79] pool5 shape is [32 7 7 512]
...: I tensorflow/core/kernels/logging_ops.cc:79] fc1_flat shape is [32 25088]
...: I tensorflow/core/kernels/logging_ops.cc:79] fc1 shape is [32 4096]
time to evaluate fully connected layer for minibatch size 32: 0.329 seconds
output shape is (32, 4096)

所以对我来说,对于32的小批量大小,它需要不到一秒的时间(在GPU上)。

您可以在代码中插入类似的tf.Print()语句,并验证您是否具有相同(或类似)的维度。通过乘以维度的大小,您可以看到每个阶段使用了多少内存。