Question

背景：

我是TensorFlow的Python开发人员。

系统规格：

i5-7200U CPU @ 2.50GHz×4
GeForce 940MX 4GB
Ubuntu 18

我在Docker上运行TensorFlow（发现安装cuda的东西太复杂了，而且很长一段时间，也许我搞砸了）

基本上，我正在GPU和CPU上运行一种HelloWorld代码，并检查它有什么区别，令我惊讶的是几乎没有任何区别！

docker-compose.yml

version: '2.3'

services:
  tensorflow:
    # image: tensorflow/tensorflow:latest-gpu-py3
    image: tensorflow/tensorflow:latest-py3
    runtime: nvidia
    volumes:
      - ./:/notebooks/TensorTest1
    ports:
      - 8888:8888

当我与image: tensorflow/tensorflow:latest-py3一起跑步时，我得到大约5秒钟。

root@e7dc71acfa59:/notebooks/TensorTest1# python3 hello1.py 
2018-11-18 14:37:24.288321: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
TIME: 4.900559186935425
result:  [3. 3. 3. ... 3. 3. 3.]

当我和image: tensorflow/tensorflow:latest-gpu-py3一起跑步时，我又得到了大约5秒钟的时间。

root@baf68fc71921:/notebooks/TensorTest1# python3 hello1.py 
2018-11-18 14:39:39.811575: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2018-11-18 14:39:39.877483: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:964] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2018-11-18 14:39:39.878122: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1432] Found device 0 with properties: 
name: GeForce 940MX major: 5 minor: 0 memoryClockRate(GHz): 1.189
pciBusID: 0000:01:00.0
totalMemory: 3.95GiB freeMemory: 3.56GiB
2018-11-18 14:39:39.878148: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1511] Adding visible gpu devices: 0
 2018-11-18 14:44:17.101263: I tensorflow/core/common_runtime/gpu/gpu_device.cc:982] Device interconnect StreamExecutor with strength 1 edge matrix:
2018-11-18 14:44:17.101303: I tensorflow/core/common_runtime/gpu/gpu_device.cc:988]      0 
2018-11-18 14:44:17.101313: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1001] 0:   N 
2018-11-18 14:44:17.101540: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 3259 MB memory) -> physical GPU (device: 0, name: GeForce 940MX, pci bus id: 0000:01:00.0, compute capability: 5.0)
TIME: 5.82940673828125
result:  [3. 3. 3. ... 3. 3. 3.]

我的代码

import tensorflow as tf
import time

with tf.Session():
    start_time = time.time()

    input1 = tf.constant([1.0, 1.0, 1.0, 1.0] * 100 * 100 * 100)
    input2 = tf.constant([2.0, 2.0, 2.0, 2.0] * 100 * 100 * 100)
    output = tf.add(input1, input2)
    result = output.eval()

    duration = time.time() - start_time
    print("TIME:", duration)

    print("result: ", result)

我在这里做错什么了吗？根据打印结果，似乎可以正确使用GPU

在Can I measure the execution time of individual operations with TensorFlow?执行了这些步骤，然后我得到了

Answer 1

GPU是“外部”处理器，为其编译程序，运行该程序，发送该数据以及检索结果涉及很多开销。 GPU还具有与CPU不同的性能折衷。尽管GPU在处理大型和复杂的数字运算任务时通常更快，但是您的“ hello world”却太简单了。在加载和保存每个数据项之间，它并没有做很多事情（只是成对加法），而在 all 中，它并没有做太多事情-一百万个操作是什么。这使得任何设置/拆卸的开销相对更加明显。因此，尽管此程序的GPU速度较慢，但对于更有用的程序，GPU的速度仍可能会更快。

TensorFlow GPU：HelloWorld代码没有性能提升

1 个答案: