Azure 504 DeploymentTimedOut错误-服务部署轮询达到不成功的终端状态,当前服务状态

时间:2020-02-25 09:23:43

标签: azure azure-machine-learning-service

我正在尝试在Azure的AciWebservice中部署我的机器学习模型,以暴露端点以供进一步使用。但是,它向我显示了DeploymentTimedOut的状态504错误。在本地,我的模型运行良好。这是我的预测。py

%%writefile prediction.py
import json
import numpy as np
import os
import pickle
from sklearn.externals import joblib
from sklearn.linear_model import LogisticRegression
from azureml.core.model import InferenceConfig
from azureml.core.conda_dependencies import CondaDependencies
from azureml.core.model import Model
from azureml.core.environment import Environment
from azureml.core.webservice import LocalWebservice, Webservice

def init():
    global model
    # retrieve the path to the model file using the model name
    model_path = Model.get_model_path('prediction_model')
    model = joblib.load(model_path)

def run(raw_data):
    data = np.array(json.loads(raw_data)['data'])
    # make prediction
    y_hat = model.predict(data)
    return json.dumps(y_hat.tolist())

环境就在这里

myenv = Environment(name="myenv")
myenv.docker.enabled = True
myenv.docker.base_image = "mcr.microsoft.com/azureml/o16n-sample-user-base/ubuntu-miniconda"


myenv.docker.base_image_registry.address = "shohozds.azurecr.io"
myenv.docker.base_image_registry.username = "farhad"
myenv.docker.base_image_registry.password = "*********************"

myenv.inferencing_stack_version = "latest" 


conda_dep = CondaDependencies()

conda_dep.add_pip_package("azureml-defaults")
myenv.python.conda_dependencies=conda_dep
myenv.register(workspace=ws)

在InferenceConfig中使用此环境

inference_config = InferenceConfig(entry_script="prediction.py",
                                   environment=envs['myenv'])

AciWebservice配置

deployment_config = AciWebservice.deploy_configuration(cpu_cores = 1, memory_gb = 1)

现在是模型部署

service = Model.deploy(ws, "myservice", [model], inference_config, deployment_config)
service.wait_for_deployment(show_output = True)
print(service.state)

但是我遇到这个错误

"code": "DeploymentTimedOut",
"statusCode": 504,

这是完整的踪迹

ERROR - Service deployment polling reached non-successful terminal state, current service state: Unhealthy
Operation ID: 0e37b930-2707-4d6b-92b0-2203d1c45978
More information can be found using '.get_logs()'
Error:
{
  "code": "DeploymentTimedOut",
  "statusCode": 504,
  "message": "The deployment operation polling has TimedOut. The service creation is taking longer than our normal time. We are still trying to achieve the desired state for the web service. Please check the webservice state for the current webservice health. You can run print(service.state) from the python SDK to retrieve the current state of the webservice."
}

1 个答案:

答案 0 :(得分:0)

在初始化期间可能会出现某些故障,从而导致服务保持不正常状态。

根据错误消息,您可以运行service.get_logs()从运行状况不佳的服务获取日志信息,以查看导致该服务失败的原因。如果我不得不猜测一下您的代码,可能是get_model_path的问题,但是日志肯定会说。

有关如何调试失败的服务的更多信息,请参见here