正确的参数,用于训练每个图像有多个类的AWS Sagemaker

时间:2019-02-10 21:43:14

标签: amazon-web-services amazon-sagemaker

我一直发现,对于图像分类作业,“ multi_label”为“ 1”,它们崩溃并出现以下错误:

Algorithm Error: Internal Server Error
[15:56:08] /opt/brazil-pkg-cache/packages/MXNetECL/MXNetECL-master.657.0/AL2012/generic-flavor/src/src/operator/custom/custom.cc:418: Check failed: reinterpret_cast<CustomOpFBFunc>(params.info->callbacks[kCustomOpBackward])( ptrs.size(), const_cast<void**>(ptrs.data()), const_cast<int*>(tags.data()), reinterpret_cast<const int*>(req.data()), static_cast<int>(ctx.is_train), params.info->contexts[kC
15:56:08 Stack trace returned 7 entries:
15:56:08 [bt] (0) /opt/amazon/lib/libaialgsdataiter.so(dmlc::StackTrace()+0x3d) [0x7f85e19f179d]
15:56:08 [bt] (1) /opt/amazon/lib/libaialgsdataiter.so(dmlc::LogMessageFatal::~LogMessageFatal()+0x1a) [0x7f85e19f1a3a] 
15:56:08 [bt] (2) /opt/amazon/lib/libmxnet.so(+0x26da8fd) [0x7f85d0edb8fd]
15:56:08 [bt] (3) /opt/amazon/lib/libmxnet.so(std::thread::_Impl<std::_Bind_simple<mxnet::op::custom::CustomOperator::CustomOperator()::{lambda()#1} ()> >::_M_run()+0x12f) [0x7f85d0ede0ef]
15:56:08 [bt] (4) /opt/amazon/lib/libstdc++.so.6(+0xce440) [0x7f85cc9ea440]
15:56:08 [bt] (5) /lib64/libpthread.so.0(+0x7dc5) [0x7f85e31e1dc5]
15:56:08 [bt] (6) /lib64/libc.so.6(clone+0x6d) [0x7f85e25de6ed]
15:56:08 Algorithm Error: Internal Server Error

根据我对文档的理解,此参数应该使您可以为每个图像分配多个标签-是否有技巧使其工作或调试这些堆栈跟踪? (https://docs.aws.amazon.com/sagemaker/latest/dg/IC-Hyperparameter.html

2 个答案:

答案 0 :(得分:1)

哦,这很丑...您能否分享能让我们重现该错误的代码?完整日志也将很有用。很高兴代表您提出支持票。

Julien(AWS)

答案 1 :(得分:0)

可以请您检查用于训练的记录文件吗?请遵循此example,了解如何为多标签训练准备数据集