我尝试使用gcloud ml-engine训练对象检测模型,参考官方文档https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/running_on_cloud.md,并设置runtime-version = 1.4,并引用此问题https://github.com/tensorflow/models/issues/2739进行修改setup.py,但有错误:
工人复制品-3- 2018-01-09 06:32:39.416080:I tensorflow / core / platform / cpu_feature_guard.cc:137]您的CPU支持未编译此TensorFlow二进制文件的指令:SSE4.1 SSE4.2 AVX
KER-复制品-3- grpc epoll fd:3
{
insertId: "1fwigqcg5k37j2o"
jsonPayload: {
created: 1515479559.41658
levelname: "ERROR"
lineno: 1051
message: " grpc epoll fd: 3"
pathname: "ev_epoll1_linux.c"
thread: 917
}
最后一条错误消息是:
The replica master 0 ran out-of-memory and exited with a non-zero status of 247.
我使用以下命令在Cloud ML Engine上启动培训作业:
gcloud ml-engine jobs submit training object_detection_training_date +%s \
--job-dir=gs://mybucket/train \
--packages dist/object_detection-0.1.tar.gz,slim/dist/slim-0.1.tar.gz \
--module-name object_detection.train \
--region asia-east1 \
--config object_detection/samples/cloud/cloud.yml \
-- \
--train_dir=gs://mybucket/train \
--pipeline_config_path=gs://mybucket/data/ssd_mobilenet_v1_coco.config \
--runtime-version 1.4
答案 0 :(得分:1)
目前仅支持运行时版本1.2。我们正在开发其他版本。
答案 1 :(得分:0)
FYI该日志消息不是ERROR。去年八月它被降级为grpc代码库中的INFO日志。