我正在使用tensorflow-gpu
版2.0.0
,并且我已经安装了gpu驱动程序以及CUDA和cuDNN (CUDA version 10.1.243_426
和cuDNN v7.6.5.32
我正在使用Windows!)
编译或运行模型时:
from tensorflow.python.client import device_lib
print(device_lib.list_local_devices())
它将打印出:
2020-01-12 19:56:50.961755: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2
2020-01-12 19:56:50.974003: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library nvcuda.dll
2020-01-12 19:56:51.628299: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1618] Found device 0 with properties:
name: GeForce MX150 major: 6 minor: 1 memoryClockRate(GHz): 1.5315
pciBusID: 0000:01:00.0
2020-01-12 19:56:51.636256: I tensorflow/stream_executor/platform/default/dlopen_checker_stub.cc:25] GPU libraries are statically linked, skip dlopen check.
2020-01-12 19:56:51.642106: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1746] Adding visible gpu devices: 0
2020-01-12 19:56:52.386608: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1159] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-01-12 19:56:52.393162: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1165] 0
2020-01-12 19:56:52.396516: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1178] 0: N
2020-01-12 19:56:52.400632: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1304] Created TensorFlow device (/device:GPU:0 with 1356 MB memory) -> physical GPU (device: 0, na
me: GeForce MX150, pci bus id: 0000:01:00.0, compute capability: 6.1)
[name: "/device:CPU:0"
device_type: "CPU"
memory_limit: 268435456
locality {
}
incarnation: 1008745203605650029
, name: "/device:GPU:0"
device_type: "GPU"
memory_limit: 1422723891
locality {
bus_id: 1
links {
}
}
incarnation: 18036547379173389852
physical_device_desc: "device: 0, name: GeForce MX150, pci bus id: 0000:01:00.0, compute capability: 6.1"
]
这是说tensorflow肯定会使用gpu设备!但是当我运行模型时,我可以看到gpu没有做任何事情!
但是您会看到部分gpu内存正在使用,甚至我也可以看到gpu活动,这是我的程序!
这是怎么回事?!难道我做错了什么?!我已经搜索了很多东西,并在SO中检查了很多问题,但没人问这样的问题!
答案 0 :(得分:0)
取自TensorFlow的官方文档。
import tensorflow as tf
tf.debugging.set_log_device_placement(True)
# Create some tensors
a = tf.constant([[1.0, 2.0, 3.0], [4.0, 5.0, 6.0]])
b = tf.constant([[1.0, 2.0], [3.0, 4.0], [5.0, 6.0]])
c = tf.matmul(a, b)
print(c)
如果您运行以上代码(如果TensorFlow可以看到GPU,则该代码应在您的GPU上运行),然后您的训练将在TensorFlow上进行。
您必须看到这样的输出:
在设备中执行操作MatMul / job:localhost / replica:0 / task:0 / device:GPU:0 tf.Tensor([[22. 28.] [49。 64.]],形状=(2,2),dtype = float32)
此外,您可以在任务管理器中看到专用GPU内存使用量激增->似乎正在使用您的GPU,但可以肯定的是运行上面的代码。
答案 1 :(得分:-3)
还注意到Windows任务管理器对于监视GPU(双)活动没有用。尝试安装TechPowerUp GPU-Z。 (我正在运行双NVidia卡)。这样可以监视CPU和GPU的活动,功率和温度。