我的代码能够成功构建图形并在Azure ML上以CPU模式运行图形,但GPU在图形构建阶段报告ResourceException。
我只需删除设备命令即可在CPU和GPU模式之间切换:
使用tf.device(' / cpu:0'),tf.name_scope('嵌入'):#cpu模式运行良好
我尝试加载较少的数据,但也没有。
我怀疑在设置GPU时我错过了一些步骤。有什么想法吗?
Azure错误消息:
tensorflow.python.framework.errors_impl.ResourceExhaustedError:分配张量形状时的OOM [78298,300] [[Node:embedding_matrix / Assign = Assign [T = DT_FLOAT,_class = [" loc:@ embedding_matrix"],use_locking = true,validate_shape = true,_device =" / job:localhost / replica :0 /任务:0 /设备:GPU:0"](embedding_matrix,embedding_matrix / Initializer / Const)]]
完成错误消息:
追踪(最近一次通话): 文件" /anaconda/envs/py35/lib/python3.5/site-packages/tensorflow/python/client/session.py" ;,第1323行,在_do_call return fn(* args) 文件" /anaconda/envs/py35/lib/python3.5/site-packages/tensorflow/python/client/session.py" ;,第1302行,在_run_fn中 status,run_metadata) 文件" /anaconda/envs/py35/lib/python3.5/site-packages/tensorflow/python/framework/errors_impl.py" ;,第473行,退出 c_api.TF_GetCode(self.status.status)) tensorflow.python.framework.errors_impl.ResourceExhaustedError:分配形状的张量时的OOM [78298,300] [[Node:embedding_matrix / Assign = Assign [T = DT_FLOAT,_class = [" loc:@ embedding_matrix"],use_locking = true,validate_shape = true,_device =" / job:localhost / replica :0 /任务:0 /设备:GPU:0"](embedding_matrix,embedding_matrix / Initializer / Const)]]
在处理上述异常期间,发生了另一个异常:
追踪(最近一次通话): 文件" NN.py",第130行,in sess.run(INIT) 文件" /anaconda/envs/py35/lib/python3.5/site-packages/tensorflow/python/client/session.py" ;,第889行,在运行中 run_metadata_ptr) 文件" /anaconda/envs/py35/lib/python3.5/site-packages/tensorflow/python/client/session.py" ;,第1120行,在_run中 feed_dict_tensor,options,run_metadata) 文件" /anaconda/envs/py35/lib/python3.5/site-packages/tensorflow/python/client/session.py",第1317行,在_do_run中 选项,run_metadata) 文件" /anaconda/envs/py35/lib/python3.5/site-packages/tensorflow/python/client/session.py" ;,第1336行,在_do_call 提升类型(e)(node_def,op,message) tensorflow.python.framework.errors_impl.ResourceExhaustedError:分配形状的张量时的OOM [78298,300] [[Node:embedding_matrix / Assign = Assign [T = DT_FLOAT,_class = [" loc:@ embedding_matrix"],use_locking = true,validate_shape = true,_device =" / job:localhost / replica :0 /任务:0 /设备:GPU:0"](embedding_matrix,embedding_matrix / Initializer / Const)]]
由op' embedding_matrix / Assign'引起,定义于: 文件" NN.py",第120行,in ,trainable = False) 文件" /anaconda/envs/py35/lib/python3.5/site-packages/tensorflow/python/ops/variable_scope.py" ;,第1203行,在get_variable中 约束=约束) 文件" /anaconda/envs/py35/lib/python3.5/site-packages/tensorflow/python/ops/variable_scope.py",第1092行,在get_variable中 约束=约束) 文件" /anaconda/envs/py35/lib/python3.5/site-packages/tensorflow/python/ops/variable_scope.py" ;,第425行,在get_variable中 约束=约束) 文件" /anaconda/envs/py35/lib/python3.5/site-packages/tensorflow/python/ops/variable_scope.py",第394行,在_true_getter中 use_resource = use_resource,constraint = constraint) 文件" /anaconda/envs/py35/lib/python3.5/site-packages/tensorflow/python/ops/variable_scope.py",第805行,在_get_single_variable中 约束=约束) 文件" /anaconda/envs/py35/lib/python3.5/site-packages/tensorflow/python/ops/variables.py",第213行, init 约束=约束) 文件" /anaconda/envs/py35/lib/python3.5/site-packages/tensorflow/python/ops/variables.py",第346行,在_init_from_args中 validate_shape = validate_shape).OP 文件" /anaconda/envs/py35/lib/python3.5/site-packages/tensorflow/python/ops/state_ops.py" ;,第276行,分配 validate_shape = validate_shape) 文件" /anaconda/envs/py35/lib/python3.5/site-packages/tensorflow/python/ops/gen_state_ops.py" ;,第57行,分配 use_locking = use_locking,name = name) 文件" /anaconda/envs/py35/lib/python3.5/site-packages/tensorflow/python/framework/op_def_library.py",第787行,在_apply_op_helper中 op_def = op_def) 文件" /anaconda/envs/py35/lib/python3.5/site-packages/tensorflow/python/framework/ops.py",第2956行,在create_op中 op_def = op_def) 文件" /anaconda/envs/py35/lib/python3.5/site-packages/tensorflow/python/framework/ops.py",第1470行, init self._traceback = self._graph._extract_stack()#pylint:disable = protected-access
ResourceExhaustedError(参见上面的回溯):OOM在分配具有形状的张量时[78298,300] [[Node:embedding_matrix / Assign = Assign [T = DT_FLOAT,_class = [" loc:@ embedding_matrix"],use_locking = true,validate_shape = true,_device =" / job:localhost / replica :0 /任务:0 /设备:GPU:0"](embedding_matrix,embedding_matrix / Initializer / Const)]]
答案 0 :(得分:0)
主机内存比N系列计算机的设备内存大很多。 您确定只是没有超出设备容量吗?