Question

我正在尝试将TensorFlow模型部署到GCP的Cloud Machine Learning Engine进行预测，但是出现以下错误：

$> gcloud ml-engine versions create v1 --model $MODEL_NAME --origin $MODEL_BINARIES --runtime-version 1.9

Creating version (this might take a few minutes)......failed.
ERROR: (gcloud.ml-engine.versions.create) Bad model detected with error:  "Failed to load model: Loading servable: {name: default version: 1} failed: Invalid argument: Cannot assign a device for operation 'tartarus/dense_2/bias': Operation was explicitly assigned to /device:GPU:3 but available devices are [ /job:localhost/replica:0/task:0/device:CPU:0 ]. Make sure the device specification refers to a valid device.\n\t [[Node: tartarus/dense_2/bias = VariableV2[_class=[\"loc:@tartarus/dense_2/bias\"], _output_shapes=[[200]], container=\"\", dtype=DT_FLOAT, shape=[200], shared_name=\"\", _device=\"/device:GPU:3\"]()]]\n\n (Error code: 0)"

我的模型在多个GPU上进行了训练，似乎CMLE上的默认计算机不支持GPU进行预测，因此出现了错误。所以，我想知道是否可能以下情况：

将allow_soft_placement变量设置为True，以便CMLE可以在给定模型中使用CPU而不是GPU。
为给定模型在CMLE上激活GPU预测。

如果没有，如何将在GPU上训练的TF模型部署到CMLE进行预测？感觉这应该是一个简单易用的功能，但是我找不到有关它的任何文档。

谢谢！

Answer 1

我从没使用过gcloud ml-engine versions create，但是当您使用gcloud ml-engine jobs submit training部署训练作业时，可以添加一个config标志来标识配置文件。

此文件使您可以确定要训练的目标机器，并且可以使用多个CPU和GPU。配置文件的文档为here。

将TensorFlow模型部署到GCP时如何允许软设备放置？

1 个答案: