我使用AMI坐了一台AWS Deep Learning机器。现在我试图从TensorFlow中运行简单的启动器示例
# Creates a graph.
a = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[2, 3], name='a')
b = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[3, 2], name='b')
c = tf.matmul(a, b)
# Creates a session with log_device_placement set to True.
sess = tf.Session(config=tf.ConfigProto(log_device_placement=True))
# Runs the op.
print(sess.run(c))
但似乎我的机器没有使用我的GPU。
MatMul_2:(MatMul):/ job:localhost / replica:0 / task:0 / cpu:0 2017-07-09 00:51:03.830238:我 tensorflow / core / common_runtime / simple_placer.cc:847] MatMul_2: (MatMul)/ job:localhost / replica:0 / task:0 / cpu:0 MatMul_1:(MatMul): / job:localhost / replica:0 / task:0 / cpu:0 2017-07-09 00:51:03.830259:I tensorflow / core / common_runtime / simple_placer.cc:847] MatMul_1: (MatMul)/ job:localhost / replica:0 / task:0 / cpu:0 MatMul:(MatMul): / job:localhost / replica:0 / task:0 / cpu:0 2017-07-09 00:51:03.830271:I tensorflow / core / common_runtime / simple_placer.cc:847] MatMul: (MatMul)/ job:localhost / replica:0 / task:0 / cpu:0 b_2:(Const): / job:localhost / replica:0 / task:0 / cpu:0 2017-07-09 00:51:03.830283:I tensorflow / core / common_runtime / simple_placer.cc:847] b_2: (Const)/ job:localhost / replica:0 / task:0 / cpu:0 a_2:(Const): / job:localhost / replica:0 / task:0 / cpu:0 2017-07-09 00:51:03.830312:I tensorflow / core / common_runtime / simple_placer.cc:847] a_2: (Const)/ job:localhost / replica:0 / task:0 / cpu:0 b_1:(Const): / job:localhost / replica:0 / task:0 / cpu:0 2017-07-09 00:51:03.830324:I tensorflow / core / common_runtime / simple_placer.cc:847] b_1: (Const)/ job:localhost / replica:0 / task:0 / cpu:0 a_1:(Const): / job:localhost / replica:0 / task:0 / cpu:0 2017-07-09 00:51:03.830337:I tensorflow / core / common_runtime / simple_placer.cc:847] a_1: (Const)/ job:localhost / replica:0 / task:0 / cpu:0 b:(Const): / job:localhost / replica:0 / task:0 / cpu:0 2017-07-09 00:51:03.830348:I tensorflow / core / common_runtime / simple_placer.cc:847] b: (Const)/ job:localhost / replica:0 / task:0 / cpu:0 a:(Const): / job:localhost / replica:0 / task:0 / cpu:0 2017-07-09 00:51:03.830358:I tensorflow / core / common_runtime / simple_placer.cc:847] a: (常数)/作业:本地主机/复制:0 /任务:0 / CPU:0
如果我尝试使用tf.device('/gpu:0'):
手动指定GPU,则会收到以下错误:
InvalidArgumentError:无法为操作分配设备' MatMul_3': 操作已明确分配给/ device:GPU:0但可用 设备是[/ job:localhost / replica:0 / task:0 / cpu:0]。确保 设备规范是指有效的设备。 [[节点:MatMul_3 = MatMul [T = DT_FLOAT,transpose_a = false,transpose_b = false, _device =" / device:GPU:0"](a_3,b_3)]]
我对AMI的唯一更改是我将TensorFlow更新为最新版本
这是我在运行时看到的内容观看nvidia-smi
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 367.57 Driver Version: 367.57 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Tesla K80 On | 0000:00:1E.0 Off | 0 |
| N/A 44C P8 27W / 149W | 0MiB / 11439MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
答案 0 :(得分:1)
1.检查你的实例,你选择GPU吗?
使用"观看nvidia-smi"查看GPU信息。
2.检查你的AMI和tensorflow版本,也许它不支持GPU或有一些错误的配置。
我使用此AMI:深度学习AMI Amazon Linux(ami-296e7850)。