我有一个使用Azure ML Workspace的autoML功能训练的StackEnsemble模型。当我尝试将其部署为Web服务时收到以下错误(CrashLoopBackOff)。 现在,我强烈怀疑它与模型本身/所需的依赖关系有关。当我将score.py中的模型名称交换给另一个(不是StackEnsemble(带有缩放器),而是一个普通的XGBoost)时,该服务的创建就没有问题了。
我有以下问题: -我如何找出StackEnsemble内包含哪些模型/算法,以正确构建容器/依赖项列表? -有什么方法可以找出实际的错误所在吗?我的意思是除了创建我的本地容器并在那里调试之外... 我试图按照文档使用service.get_logs()来获取日志,但是那里什么也没有,只有最后5行没有指向任何问题。
请咨询。
WebserviceException: Service deployment polling reached non-successful terminal state, current service state: Failed
Error:
{
"code": "AciDeploymentFailed",
"message": "Aci Deployment failed with exception: Your container application crashed. This may be caused by errors in your scoring file's init() function.\nPlease check the logs for your container instance: classifier-bwp-ls5923-v1. From the AML SDK, you can run print(service.get_logs()) if you have service object to fetch the logs. \nYou can also try to run image mlws219f9669.azurecr.io/classifier-bwp-ls5923-v1:4 locally. Please refer to http://aka.ms/debugimage#service-launch-fails for more information.",
"details": [
{
"code": "CrashLoopBackOff",
"message": "Your container application crashed. This may be caused by errors in your scoring file's init() function.\nPlease check the logs for your container instance: classifier-bwp-ls5923-v1. From the AML SDK, you can run print(service.get_logs()) if you have service object to fetch the logs. \nYou can also try to run image mlws219f9669.azurecr.io/classifier-bwp-ls5923-v1:4 locally. Please refer to http://aka.ms/debugimage#service-launch-fails for more information."
}
]
}
答案 0 :(得分:0)
我不确定如何在Ensemble中使用模型,但是在此期间您还可以尝试减轻其他一些负担。
当您的服务停留在CrashLoopBackoff中时,它将继续重新引导,这意味着由于日志存储在容器本身中,因此将继续被擦除。一个快速的解决方法是仅运行get_logs()
函数几次以查看所有发生的情况。
要获取历史信息,请确保在InferenceConfig中设置了appInsightsEnabled
,以便您可以跟踪附加到工作区的AppInsights中的日志。
除了依赖项不匹配以外,CrashLoopBackoff的最常见原因是没有为服务提供足够的内存来实际加载和对模型评分。尝试增加服务的内存预留。