在错误的python版本上运行的ML引擎批量预测

时间:2019-02-27 18:53:15

标签: python tensorflow google-cloud-platform google-cloud-ml

enter image description here

所以我在ML 3.5中向ML引擎注册了一个tensorflow模型,我想使用它运行批处理预测作业。我的API请求正文如下:

{
  "versionName": "XXXXX/v8_0QSZ",
  "dataFormat": "JSON",
  "inputPaths": [
    "XXXXX"
  ],
  "outputPath": "XXXXXX",
  "region": "us-east1",
  "runtimeVersion": "1.12",
  "accelerator": {
    "count": "1",
    "type": "NVIDIA_TESLA_P100"
  }
}

然后,批处理预测作业运行并返回“作业成功完成。”,但是,它是完全不成功的,并且始终为每个输入引发以下错误:

Exception during running the graph: Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above. [[node convolution_layer/conv1d/conv1d/Conv2D (defined at /usr/local/lib/python2.7/dist-packages/google/cloud/ml/prediction/frameworks/tf_prediction_lib.py:210) = Conv2D[T=DT_FLOAT, data_format="NCHW", dilations=[1, 1, 1, 1], padding="VALID", strides=[1, 1, 1, 1], use_cudnn_on_gpu=true, _device="/job:localhost/replica:0/task:0/device:GPU:0"](convolution_layer/conv1d/conv1d/Conv2D-0-TransposeNHWCToNCHW-LayoutOptimizer, convolution_layer/conv1d/conv1d/ExpandDims_1)]] [[{{node Cast_6/_495}} = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_789_Cast_6", tensor_type=DT_INT64, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]] 

我的问题是:

  • 为什么批处理作业在完全失败的情况下报告成功?
  • 在上面的例外中,它提到了python 2.7 ...但是该模型已注册为python 3.5,并且无法使用API​​指定python版本。为什么使用2.7进行批量预测?
  • 一般来说我可以做些什么?
  • 这与我的加速器选项有关系吗?

1 个答案:

答案 0 :(得分:1)

批处理预测开发人员的回应:“我们尚未正式支持Python3。但是,您遇到的问题是一个已知的错误,会影响我们针对TF 1.11和1.12的GPU运行时。