在GeForce GTX 1080上应用量化工具后,为什么我的Tensorflow网络会变慢?

时间:2016-08-04 19:19:46

标签: gpu tensorflow quantization

我在玩具模型的Tensorflow中尝试了量化工具。它确实将模型减少到大约25%,然而,将执行时间增加了许多倍。

两种型号运行时都可以完全使用GPU。所以我想知道出了什么问题?我想有两种可能性:

  1. Tensorflow量化工具不使用GPU上的浮动计算核心。
  2. 我的部署有问题。
  3. 欢迎任何建议!谢谢!

    我使用的模型是:

    def dense_cnn_model(weights):
        def conv2d(x, W): 
            return tf.nn.conv2d(x, W, strides=[1, 1, 1, 1], padding='SAME')
    
        def max_pool_2x2(x):
            return tf.nn.max_pool(x, ksize=[1, 2, 2, 1], 
                                  strides=[1, 2, 2, 1], padding='SAME')
    
        x_image = tf.reshape(x, [-1,28,28,1])
        h_conv1 = tf.nn.relu(conv2d(x_image, weights["w_conv1"]) + weights["b_conv1"])
        h_pool1 = max_pool_2x2(h_conv1)
        h_conv2 = tf.nn.relu(conv2d(h_pool1, weights["w_conv2"]) + weights["b_conv2"])
        h_pool2 = max_pool_2x2(h_conv2)
        h_pool2_flat = tf.reshape(h_pool2, [-1, 7*7*64])
        h_fc1 = tf.nn.relu(tf.matmul(h_pool2_flat, weights["w_fc1"]) + weights["b_fc1"])
        h_fc1_drop = tf.nn.dropout(h_fc1, keep_prob)
        y_conv=tf.nn.softmax(tf.matmul(h_fc1_drop, weights["w_fc2"]) + weights["b_fc2"],name='softmax')
        return y_conv
    

    使用量化工具,冻结图形从13M压缩到3.2M。

    -rw-rw-r-- 1 yonghu yonghu 3.2M Aug  3 22:27 quantified_const_kb.pb
    -rw-rw-r-- 1 yonghu yonghu  13M Aug  3 22:22 unified_const_kb.pb
    

    然而,它在GeForce GTX 1080上变得更慢。原始模型的基准性能如下:

    I tensorflow/core/util/stat_summarizer.cc:218] 50 runs, avg 15.25 ms, 47 nodes defined 67 nodes observed
    ============ By run order =================
      [start]  [first]    [avg]      [%]      [cdf%]          [Op]  [Name]
        0.000    0.013    0.063   0.413%      0.413%                _SOURCE
        2.177    0.012    0.010   0.063%      0.476%                w_conv1/read/_7__cf__7
        2.193    0.009    0.007   0.046%      0.522%                b_conv1/read/_6__cf__6
        2.205    0.007    0.007   0.046%      0.567%                w_conv2/read/_5__cf__5
        2.215    0.007    0.007   0.044%      0.611%                b_conv2/read/_4__cf__4
        2.225    0.010    0.007   0.043%      0.654%                b_fc1/read/_3__cf__3
        2.237    0.008    0.006   0.042%      0.696%                dropout/random_uniform/sub/_2__cf__2
        2.247    0.008    0.006   0.040%      0.736%                w_fc2/read/_1__cf__1
        2.257    0.006    0.006   0.039%      0.774%                b_fc2/read/_0__cf__0
        2.266    0.010    0.006   0.042%      0.816%         Const  w_fc1
        2.279    0.008    0.006   0.040%      0.857%         Const  Reshape/shape
        2.288    0.001    0.002   0.014%      0.870%                edge_67__recv_x_0:MEMCPYHtoD
        2.290    0.007    0.006   0.038%      0.908%         Const  Reshape_1/shape
        2.299    0.007    0.006   0.041%      0.949%         Const  dropout/random_uniform/min
        2.308    0.008    0.006   0.042%      0.992%      Identity  w_fc1/read
        2.434    0.013    0.012   0.082%      1.074%       Reshape  Reshape
        2.452  523.172   10.644  69.788%     70.862%        Conv2D  Conv2D
      310.531    0.005    0.047   0.308%     71.170%                Conv2D:Conv2D
      524.645    0.001    0.000   0.000%     71.170%                Conv2D:Conv2D:MEMCPYHtoD
      525.636    0.035    0.039   0.253%     71.424%           Add  add
      525.667    0.004    0.005   0.032%     71.455%                add:Add
      525.677    0.020    0.031   0.206%     71.661%          Relu  Relu
      525.694    0.003    0.004   0.024%     71.686%                Relu:Relu
      525.701    0.031    0.043   0.285%     71.971%       MaxPool  MaxPool
      525.726    0.007    0.008   0.054%     72.025%                MaxPool:MaxPool
      525.735    0.962    0.192   1.258%     73.283%        Conv2D  Conv2D_1
      525.751    0.005    0.133   0.874%     74.157%                Conv2D_1:Conv2D
      526.705    0.019    0.034   0.225%     74.382%           Add  add_1
      526.730    0.015    0.029   0.190%     74.572%          Relu  Relu_1
      526.749    0.005    0.005   0.036%     74.608%                add_1:Add
      526.750    0.021    0.039   0.253%     74.861%       MaxPool  MaxPool_1
      526.756    0.003    0.004   0.024%     74.885%                Relu_1:Relu
      526.766    0.006    0.006   0.040%     74.925%                MaxPool_1:MaxPool
      526.775    0.006    0.008   0.055%     74.980%       Reshape  Reshape_1
      526.784  144.271    2.923  19.166%     94.146%        MatMul  MatMul
      670.941    0.001    0.000   0.000%     94.146%                MatMul:MatMul:MEMCPYHtoD
      671.038    0.070    0.089   0.585%     94.731%                MatMul:MatMul
      671.063    0.037    0.031   0.203%     94.935%           Add  add_2
      671.104    0.019    0.027   0.178%     95.113%          Relu  Relu_2
      671.110    0.005    0.006   0.041%     95.154%                add_2:Add
      671.121    0.003    0.003   0.023%     95.176%                Relu_2:Relu
      671.126    0.008    0.010   0.066%     95.242%         Shape  dropout/Shape
      671.136    0.029    0.030   0.196%     95.438%           Div  dropout/Div
      671.162    0.004    0.006   0.041%     95.479%                dropout/Div:Div
      671.167    0.021    0.029   0.193%     95.672%    RandomUniform   dropout/random_uniform/RandomUniform
      671.185    0.005    0.008   0.051%     95.723%                dropout/random_uniform/RandomUniform:RandomUniform
      671.191    0.027    0.029   0.187%     95.910%           Mul  dropout/random_uniform/mul
      671.215    0.003    0.004   0.023%     95.933%                dropout/random_uniform/mul:Mul
      671.221    0.018    0.027   0.176%     96.109%           Add  dropout/random_uniform
      671.237    0.003    0.004   0.024%     96.133%                dropout/random_uniform:Add
      671.242    0.016    0.027   0.178%     96.311%           Add  dropout/add
      671.256    0.003    0.003   0.022%     96.333%                dropout/add:Add
      671.261    0.024    0.026   0.169%     96.502%         Floor  dropout/Floor
      671.283    0.004    0.004   0.028%     96.530%                dropout/Floor:Floor
      671.288    0.017    0.027   0.180%     96.710%           Mul  dropout/mul
      671.303    0.003    0.004   0.023%     96.733%                dropout/mul:Mul
      671.308    0.019    0.034   0.223%     96.956%        MatMul  MatMul_1
      671.325    0.017    0.023   0.149%     97.106%                MatMul_1:MatMul
      671.330    0.016    0.030   0.195%     97.300%           Add  add_3
      671.345    0.007    0.009   0.060%     97.360%                add_3:Add
      671.349    0.177    0.125   0.822%     98.183%       Softmax  softmax
      671.366    0.003    0.027   0.177%     98.360%                softmax:Softmax
      671.621    0.001    0.001   0.009%     98.368%                edge_13_softmax:MEMCPYDtoH
      671.732    0.004    0.057   0.375%     98.743%                _SINK
    18446744074384.223    0.001    0.001      0.006%     98.749%                unknown:MEMCPYHtoD
    18446744074384.363    0.004    0.190      1.246%     99.996%                unknown
    18446744074385.602    0.001    0.001      0.004%    100.000%                unknown:MEMCPYDtoH
    

    量化后:

    I tensorflow/core/util/stat_summarizer.cc:218] 50 runs, avg 99.44 ms, 114 nodes defined 83 nodes observed
    ============ By run order =================
      [start]  [first]    [avg]      [%]      [cdf%]          [Op]  [Name]
        0.000    0.039    0.158   0.159%      0.159%                _SOURCE
        0.111    0.018    0.010   0.010%      0.169%                dropout/keep_prob/_3__cf__3
        0.138    0.010    0.011   0.011%      0.180%                dropout/random_uniform/min/_1__cf__1
        0.154    0.009    0.009   0.009%      0.189%                b_fc2/_0__cf__0
        0.169    0.010    0.009   0.009%      0.198%         Const  w_conv1_quint8_const
        0.184    0.008    0.008   0.008%      0.206%         Const  w_conv1_min
        0.195    0.009    0.008   0.008%      0.214%         Const  w_conv1_max
        0.208    0.056    0.009   0.009%      0.224%         Const  w_conv2_quint8_const
        0.269    0.010    0.007   0.007%      0.231%         Const  w_conv2_min
        0.283    0.009    0.007   0.007%      0.238%         Const  w_conv2_max
        0.295    0.008    0.008   0.009%      0.247%         Const  w_fc1_quint8_const
        0.307    0.013    0.007   0.007%      0.254%         Const  w_fc1_min
        0.324    0.010    0.007   0.007%      0.261%         Const  w_fc1_max
        0.338    0.010    0.007   0.007%      0.268%         Const  w_fc2_quint8_const
        0.350    0.007    0.007   0.007%      0.275%         Const  w_fc2_min
        0.360    0.007    0.007   0.007%      0.282%         Const  w_fc2_max
        0.370    0.010    0.007   0.007%      0.289%                b_conv1/_6__cf__6
        0.392    0.009    0.008   0.008%      0.297%                b_conv2/_5__cf__5
        0.411    0.008    0.008   0.008%      0.305%                b_fc1/_4__cf__4
        3.380    0.017    0.013   0.013%      0.317%         Const  Reshape/shape
        3.402    0.014    0.011   0.011%      0.328%         Const  Conv2D_eightbit_reshape_dims
        3.419    0.010    0.012   0.013%      0.341%         Const  Conv2D_eightbit_reduction_dims
        3.431    0.007    0.010   0.010%      0.351%         Const  Reshape_1/shape
       34.110    0.020    0.016   0.016%      0.368%       Reshape  Reshape
       34.159  352.617    7.132   7.172%      7.540%           Sub  dropout/random_uniform/sub
       34.234    0.010    0.011   0.011%      7.551%       Reshape  Conv2D_eightbit_reshape_Reshape
       34.249  352.581    7.113   7.153%     14.704%           Min  Conv2D_eightbit_min_Reshape
      386.852    0.063    0.043   0.043%     14.747%           Max  Conv2D_eightbit_max_Reshape
      387.104    0.070    0.057   0.058%     14.804%    QuantizeV2  Conv2D_eightbit_quantize_Reshape
      387.181    3.764    2.210   2.222%     17.027%    QuantizedConv2D Conv2D_eightbit_quantized_conv
      390.964    0.771    0.674   0.677%     17.704%    QuantizeDownAndShrinkRange  Conv2D_eightbit_quantize_down
      391.742    0.681    0.583   0.586%     18.290%    Dequantize  Conv2D
      392.608    0.086    0.064   0.064%     18.354%           Add  add
      392.781    0.012    0.011   0.011%     18.365%       Reshape  Relu_eightbit_reshape_add
      392.798    0.055    0.048   0.048%     18.413%           Min  Relu_eightbit_min_add
      392.858    0.041    0.038   0.039%     18.452%           Max  Relu_eightbit_max_add
      393.035    0.266    0.274   0.276%     18.728%    QuantizeV2  Relu_eightbit_quantize_add
      393.306    0.052    0.110   0.111%     18.838%    QuantizedRelu   Relu_eightbit_quantized
      393.362    0.201    0.152   0.153%     18.991%    QuantizedMaxPool    MaxPool_eightbit_quantized
      393.567   22.550   23.069  23.199%     42.190%    QuantizedConv2D Conv2D_1_eightbit_quantized_conv
      416.126    0.211    0.354   0.356%     42.546%    QuantizeDownAndShrinkRange  Conv2D_1_eightbit_quantize_down
      416.343    0.127    0.266   0.268%     42.814%    Dequantize  Conv2D_1
      416.577    0.035    0.058   0.058%     42.871%           Add  add_1
      416.654    0.007    0.011   0.011%     42.882%       Reshape  Relu_1_eightbit_reshape_add_1
      416.664    0.023    0.043   0.044%     42.926%           Min  Relu_1_eightbit_min_add_1
      416.690    0.018    0.033   0.033%     42.959%           Max  Relu_1_eightbit_max_add_1
      416.779    0.158    0.179   0.180%     43.140%    QuantizeV2  Relu_1_eightbit_quantize_add_1
      416.940    0.029    0.057   0.058%     43.197%    QuantizedRelu   Relu_1_eightbit_quantized
      416.973    0.089    0.082   0.082%     43.279%    QuantizedMaxPool    MaxPool_1_eightbit_quantized
      417.065    0.037    0.072   0.072%     43.352%    Dequantize  MaxPool_1
      417.175    0.008    0.011   0.011%     43.363%       Reshape  Reshape_1
      417.226    0.007    0.008   0.008%     43.371%       Reshape  MatMul_eightbit_reshape_Reshape_1
      417.237    0.028    0.047   0.048%     43.419%           Min  MatMul_eightbit_min_Reshape_1
      417.269    0.017    0.034   0.034%     43.453%           Max  MatMul_eightbit_max_Reshape_1
      417.360    0.076    0.109   0.109%     43.562%    QuantizeV2  MatMul_eightbit_quantize_Reshape_1
      417.440   31.302   54.697  55.005%     98.567%    QuantizedMatMul MatMul_eightbit_quantized_bias_add
      448.748    0.022    0.033   0.033%     98.601%    QuantizeDownAndShrinkRange  MatMul_eightbit_quantize_down
      448.773    0.016    0.024   0.024%     98.625%    Dequantize  MatMul
      448.908    0.034    0.052   0.052%     98.677%           Add  add_2
      448.980    0.006    0.008   0.009%     98.685%       Reshape  Relu_2_eightbit_reshape_add_2
      448.990    0.022    0.036   0.036%     98.721%           Min  Relu_2_eightbit_min_add_2
      449.015    0.017    0.027   0.027%     98.748%           Max  Relu_2_eightbit_max_add_2
      449.103    0.032    0.038   0.038%     98.786%    QuantizeV2  Relu_2_eightbit_quantize_add_2
      449.139    0.013    0.014   0.014%     98.801%    QuantizedRelu   Relu_2_eightbit_quantized
      449.156    0.016    0.023   0.023%     98.824%    Dequantize  Relu_2
      449.180    0.007    0.008   0.008%     98.832%         Shape  dropout/Shape
      449.215    0.092    0.086   0.086%     98.918%    RandomUniform   dropout/random_uniform/RandomUniform
      449.292    0.090    0.080   0.080%     98.999%           Div  dropout/Div
      449.314    0.105    0.053   0.053%     99.052%           Mul  dropout/random_uniform/mul
      449.425    0.039    0.053   0.053%     99.105%           Add  dropout/random_uniform
      449.469    0.054    0.046   0.046%     99.151%           Add  dropout/add
      449.528    0.043    0.032   0.032%     99.183%         Floor  dropout/Floor
      449.575    0.033    0.029   0.029%     99.212%           Mul  dropout/mul
      449.724    0.011    0.010   0.010%     99.222%       Reshape  MatMul_1_eightbit_reshape_dropout/mul
      449.740    0.050    0.042   0.042%     99.264%           Min  MatMul_1_eightbit_min_dropout/mul
      449.795    0.037    0.031   0.032%     99.296%           Max  MatMul_1_eightbit_max_dropout/mul
      449.986    0.085    0.070   0.070%     99.366%    QuantizeV2  MatMul_1_eightbit_quantize_dropout/mul
      450.077    0.525    0.367   0.369%     99.736%    QuantizedMatMul MatMul_1_eightbit_quantized_bias_add
      450.608    0.015    0.012   0.012%     99.747%    QuantizeDownAndShrinkRange  MatMul_1_eightbit_quantize_down
      450.627    0.013    0.010   0.010%     99.757%    Dequantize  MatMul_1
      450.765    0.055    0.051   0.051%     99.808%           Add  add_3
      450.825    0.254    0.133   0.134%     99.942%       Softmax  softmax
      451.200    0.006    0.058   0.058%    100.000%                _SINK
    

0 个答案:

没有答案