Question

我使用vgg.h5模型+ Keras（GPU上的Tensorflow后端）进行实时对象分类。它运作良好。

然后我尝试使用权重来自vgg.h5的纯张量流图：

我用h5py解析了vgg.h5并以numpay.array格式接收了所有图层的权重
我已经构建了一个图表（我在tf.Variable中存储内核和偏差）
但我无法接收输出预测向量。经过调查，我发现所有卷积层都有效，但vgg16中的第一个完整连接层输出（fc1与25088 x 4096权重矩阵）计算约5分钟。它不适合实时分类。

那么，也许任何人都有在tensorflow中从头开始构建vgg16的经验并且可以提供帮助吗？为什么作为Keras后端的张量流很好，但纯粹的张量流（具有相同的权重）无法计算完整的连接输出？在Keras中是否有任何额外的优化来实现完整连接（密集）层？

Answer 1

以下是您的代码的测试变体，其中包含在多个位置打印张量形状的文章：

import tensorflow as tf
import numpy as np

with tf.Session() as sess:

    # mock the previous layer's output with a placeholder
    pool5_input = tf.placeholder(dtype = tf.float32, shape = (None,7,7,512))

    # insert a print operation to print the shape
    pool5 = tf.Print(pool5_input, [ tf.shape(pool5_input) ], "pool5 shape is ", summarize = 4)

    layer_name = 'fc1'
    wd = tf.Variable(np.ones((25088, 4096), dtype='float32'), trainable=False, name=layer_name+'_wd')
    bd = tf.Variable(np.ones((4096,), dtype='float32'), trainable=False, name=layer_name+'_bd')
    layer_shape = [-1, wd.get_shape().as_list()[0]]
    print('layer_shape:', layer_shape)

    fc1_flat = tf.reshape(pool5, shape=layer_shape)
    fc1_flat = tf.Print(fc1_flat, [ tf.shape(fc1_flat) ], "fc1_flat shape is ")

    fc1 = tf.nn.relu( tf.nn.bias_add( tf.matmul(fc1_flat, wd, name=layer_name), bd ) )
    fc1 = tf.Print(fc1, [ tf.shape(fc1) ], "fc1 shape is ")

    import time
    sess.run(tf.global_variables_initializer())

    # evaluate network for in input of (minibatch_size, 7, 7, 512)
    minibatch_size = 32

    start = time.time()
    output = sess.run(fc1, feed_dict = { pool5_input: np.ones((minibatch_size, 7, 7, 512), dtype = 'float32')})

    elapsed = time.time() - start
    print("time to evaluate fully connected layer for minibatch size %d: %.3f seconds" % (minibatch_size, elapsed))
    print("output shape is",output.shape)

我得到以下输出：

layer_shape: [-1, 25088]
...: I tensorflow/core/kernels/logging_ops.cc:79] pool5 shape is [32 7 7 512]
...: I tensorflow/core/kernels/logging_ops.cc:79] fc1_flat shape is [32 25088]
...: I tensorflow/core/kernels/logging_ops.cc:79] fc1 shape is [32 4096]
time to evaluate fully connected layer for minibatch size 32: 0.329 seconds
output shape is (32, 4096)

所以对我来说，对于32的小批量大小，它需要不到一秒的时间（在GPU上）。

您可以在代码中插入类似的tf.Print()语句，并验证您是否具有相同（或类似）的维度。通过乘以维度的大小，您可以看到每个阶段使用了多少内存。

Tensorflow vgg16预测显着减慢

1 个答案: