在 Inf1 实例笔记本问题上运行 TensorFlow 模型

时间:2021-03-08 05:40:42

标签: amazon-sagemaker

我按照以下链接测试托管 TF 模型的 Sagemaker inf1 实例,但在推理时提示错误。

    predict_response = optimized_predictor.predict(data)

有人能帮忙解决这个问题吗?谢谢。

---------------------------------------------------------------------------
ModelError                                Traceback (most recent call last)
<ipython-input-12-a13c7ab7b16b> in <module>
     12     display.display(im)
     13     # Invoke endpoint with image
---> 14     predict_response = optimized_predictor.predict(data)
     15 
     16     print("========================================")

~/anaconda3/envs/tensorflow_p36/lib/python3.6/site-packages/sagemaker/predictor.py in predict(self, data, initial_args, target_model, target_variant)
    111 
    112         request_args = self._create_request_args(data, initial_args, target_model, target_variant)
--> 113         response = self.sagemaker_session.sagemaker_runtime_client.invoke_endpoint(**request_args)
    114         return self._handle_response(response)
    115 

~/anaconda3/envs/tensorflow_p36/lib/python3.6/site-packages/botocore/client.py in _api_call(self, *args, **kwargs)
    355                     "%s() only accepts keyword arguments." % py_operation_name)
    356             # The "self" in this scope is referring to the BaseClient.
--> 357             return self._make_api_call(operation_name, kwargs)
    358 
    359         _api_call.__name__ = str(py_operation_name)

~/anaconda3/envs/tensorflow_p36/lib/python3.6/site-packages/botocore/client.py in _make_api_call(self, operation_name, api_params)
    674             error_code = parsed_response.get("Error", {}).get("Code")
    675             error_class = self.exceptions.from_code(error_code)
--> 676             raise error_class(parsed_response, operation_name)
    677         else:
    678             return parsed_response

ModelError: An error occurred (ModelError) when calling the InvokeEndpoint operation: Received server error (0) from model with message "Your invocation timed out while waiting for a response from container model. Review the latency metrics for each container in Amazon CloudWatch, resolve the issue, and try again.". See https://us-east-1.console.aws.amazon.com/cloudwatch/home?region=us-east-1#logEventViewer:group=/aws/sagemaker/Endpoints/sagemaker-tensorflow-2021-03-08-04-58-16-313ml-inf1 in account 123456789012 for more information.

https://github.com/aws/amazon-sagemaker-examples/blob/master/aws_sagemaker_studio/sagemaker_neo_compilation_jobs/deploy_tensorflow_model_on_Inf1_instance/tensorflow_distributed_mnist_neo_inf1_studio.ipynb

0 个答案:

没有答案