背景:
我是TensorFlow的Python开发人员。
系统规格:
我在Docker上运行TensorFlow(发现安装cuda的东西太复杂了,而且很长一段时间,也许我搞砸了)
基本上,我正在GPU和CPU上运行一种HelloWorld代码,并检查它有什么区别,令我惊讶的是几乎没有任何区别!
docker-compose.yml
version: '2.3'
services:
tensorflow:
# image: tensorflow/tensorflow:latest-gpu-py3
image: tensorflow/tensorflow:latest-py3
runtime: nvidia
volumes:
- ./:/notebooks/TensorTest1
ports:
- 8888:8888
当我与image: tensorflow/tensorflow:latest-py3
一起跑步时,我得到大约5秒钟。
root@e7dc71acfa59:/notebooks/TensorTest1# python3 hello1.py
2018-11-18 14:37:24.288321: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
TIME: 4.900559186935425
result: [3. 3. 3. ... 3. 3. 3.]
当我和image: tensorflow/tensorflow:latest-gpu-py3
一起跑步时,我又得到了大约5秒钟的时间。
root@baf68fc71921:/notebooks/TensorTest1# python3 hello1.py
2018-11-18 14:39:39.811575: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2018-11-18 14:39:39.877483: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:964] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2018-11-18 14:39:39.878122: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1432] Found device 0 with properties:
name: GeForce 940MX major: 5 minor: 0 memoryClockRate(GHz): 1.189
pciBusID: 0000:01:00.0
totalMemory: 3.95GiB freeMemory: 3.56GiB
2018-11-18 14:39:39.878148: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1511] Adding visible gpu devices: 0
2018-11-18 14:44:17.101263: I tensorflow/core/common_runtime/gpu/gpu_device.cc:982] Device interconnect StreamExecutor with strength 1 edge matrix:
2018-11-18 14:44:17.101303: I tensorflow/core/common_runtime/gpu/gpu_device.cc:988] 0
2018-11-18 14:44:17.101313: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1001] 0: N
2018-11-18 14:44:17.101540: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 3259 MB memory) -> physical GPU (device: 0, name: GeForce 940MX, pci bus id: 0000:01:00.0, compute capability: 5.0)
TIME: 5.82940673828125
result: [3. 3. 3. ... 3. 3. 3.]
我的代码
import tensorflow as tf
import time
with tf.Session():
start_time = time.time()
input1 = tf.constant([1.0, 1.0, 1.0, 1.0] * 100 * 100 * 100)
input2 = tf.constant([2.0, 2.0, 2.0, 2.0] * 100 * 100 * 100)
output = tf.add(input1, input2)
result = output.eval()
duration = time.time() - start_time
print("TIME:", duration)
print("result: ", result)
我在这里做错什么了吗?根据打印结果,似乎可以正确使用GPU
在Can I measure the execution time of individual operations with TensorFlow?执行了这些步骤,然后我得到了
答案 0 :(得分:2)
GPU是“外部”处理器,为其编译程序,运行该程序,发送该数据以及检索结果涉及很多开销。 GPU还具有与CPU不同的性能折衷。尽管GPU在处理大型和复杂的数字运算任务时通常更快,但是您的“ hello world”却太简单了。在加载和保存每个数据项之间,它并没有做很多事情(只是成对加法),而在 all 中,它并没有做太多事情-一百万个操作是什么。这使得任何设置/拆卸的开销相对更加明显。因此,尽管此程序的GPU速度较慢,但对于更有用的程序,GPU的速度仍可能会更快。