TensorFlow GPU:HelloWorld代码没有性能提升

时间:2018-11-18 14:48:29

标签: python python-3.x tensorflow

背景

我是TensorFlow的Python开发人员。

系统规格:

  • i5-7200U CPU @ 2.50GHz×4
  • GeForce 940MX 4GB
  • Ubuntu 18

我在Docker上运行TensorFlow(发现安装cuda的东西太复杂了,而且很长一段时间,也许我搞砸了)


基本上,我正在GPU和CPU上运行一种HelloWorld代码,并检查它有什么区别,令我惊讶的是几乎没有任何区别!

docker-compose.yml

version: '2.3'

services:
  tensorflow:
    # image: tensorflow/tensorflow:latest-gpu-py3
    image: tensorflow/tensorflow:latest-py3
    runtime: nvidia
    volumes:
      - ./:/notebooks/TensorTest1
    ports:
      - 8888:8888

当我与image: tensorflow/tensorflow:latest-py3一起跑步时,我得到大约5秒钟。

root@e7dc71acfa59:/notebooks/TensorTest1# python3 hello1.py 
2018-11-18 14:37:24.288321: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
TIME: 4.900559186935425
result:  [3. 3. 3. ... 3. 3. 3.]

当我和image: tensorflow/tensorflow:latest-gpu-py3一起跑步时,我又得到了大约5秒钟的时间。

root@baf68fc71921:/notebooks/TensorTest1# python3 hello1.py 
2018-11-18 14:39:39.811575: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2018-11-18 14:39:39.877483: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:964] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2018-11-18 14:39:39.878122: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1432] Found device 0 with properties: 
name: GeForce 940MX major: 5 minor: 0 memoryClockRate(GHz): 1.189
pciBusID: 0000:01:00.0
totalMemory: 3.95GiB freeMemory: 3.56GiB
2018-11-18 14:39:39.878148: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1511] Adding visible gpu devices: 0
 2018-11-18 14:44:17.101263: I tensorflow/core/common_runtime/gpu/gpu_device.cc:982] Device interconnect StreamExecutor with strength 1 edge matrix:
2018-11-18 14:44:17.101303: I tensorflow/core/common_runtime/gpu/gpu_device.cc:988]      0 
2018-11-18 14:44:17.101313: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1001] 0:   N 
2018-11-18 14:44:17.101540: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 3259 MB memory) -> physical GPU (device: 0, name: GeForce 940MX, pci bus id: 0000:01:00.0, compute capability: 5.0)
TIME: 5.82940673828125
result:  [3. 3. 3. ... 3. 3. 3.]

我的代码

import tensorflow as tf
import time

with tf.Session():
    start_time = time.time()

    input1 = tf.constant([1.0, 1.0, 1.0, 1.0] * 100 * 100 * 100)
    input2 = tf.constant([2.0, 2.0, 2.0, 2.0] * 100 * 100 * 100)
    output = tf.add(input1, input2)
    result = output.eval()

    duration = time.time() - start_time
    print("TIME:", duration)

    print("result: ", result)

我在这里做错什么了吗?根据打印结果,似乎可以正确使用GPU


Can I measure the execution time of individual operations with TensorFlow?执行了这些步骤,然后我得到了enter image description here

1 个答案:

答案 0 :(得分:2)

GPU是“外部”处理器,为其编译程序,运行该程序,发送该数据以及检索结果涉及很多开销。 GPU还具有与CPU不同的性能折衷。尽管GPU在处理大型和复杂的数字运算任务时通常更快,但是您的“ hello world”却太简单了。在加载和保存每个数据项之间,它并没有做很多事情(只是成对加法),而在 all 中,它并没有做太多事情-一百万个操作是什么。这使得任何设置/拆卸的开销相对更加明显。因此,尽管程序的GPU速度较慢,但​​对于更有用的程序,GPU的速度仍可能会更快。