无法提交培训工作gcloud ml

时间:2018-04-27 15:59:25

标签: tensorflow gcloud google-cloud-ml

当我尝试提交我的培训工作时,我收到此错误。

ERROR: (gcloud.ml-engine.jobs.submit.training) Could not copy [dist/object_detection-0.1.tar.gz] to [packages/10a409168355064d603079b7c34cdd7010a13b181a8f7776751e9110d66a5bdf/object_detection-0.1.tar.gz]. Please retry: HTTPError 404: Not Found

我正在运行以下代码:

gcloud ml-engine jobs submit training ${train1} \
    --job-dir=gs://${object-detection-tutorial-bucket1/}/train \
    --packages dist/object_detection-0.1.tar.gz,slim/dist/slim-0.1.tar.gz \
    --module-name object_detection.train1 \
    --region us-central1 \
    --config object_detection/samples/cloud/cloud.yml \
    --runtime-version=1.4 \ 
    -- \
    --train_dir=gs://${object-detection-tutorial-bucket1/}/train \
    --pipeline_config_path=gs://${object-detection-tutorial- 
    bucket1/}/data/ssd_mobilenet_v1_coco.config  

3 个答案:

答案 0 :(得分:0)

您使用的语法看起来不正确。

如果您的广告素材的名称是object-detection-tutorial-bucket1,那么您可以使用以下内容指定:

--job-dir=gs://object-detection-tutorial-bucket1/train

或者你可以运行:

export YOUR_GCS_BUCKET="gs://object-detection-tutorial-bucket1"

然后将存储桶指定为:

--job-dir=${YOUR_GCS_BUCKET}/train

${}语法用于访问变量的值,但object-detection-tutorial-bucket1/不是有效的变量名,因此它的计算结果为空。

来源:

https://cloud.google.com/blog/big-data/2017/06/training-an-object-detector-using-cloud-machine-learning-engine

Difference between ${} and $() in Bash

答案 1 :(得分:0)

在脚本中删除$ {}。考虑将您的存储桶名称设置为object-detection-tutorial-bucket1,运行以下脚本 -

gcloud ml-engine jobs submit training \ 
--job-dir=gs://object-detection-tutorial-bucket1/train \
--packages dist/object_detection-0.1.tar.gz,slim/dist/slim-0.1.tar.gz \
--module-name object_detection.train1 \
--region us-central1 \
--config object_detection/samples/cloud/cloud.yml \
--runtime-version=1.4 \
-- \
--train_dir=gs://object-detection-tutorial-bucket1/train \
--pipeline_config_path=gs://object-detection-tutorial- \
bucket1/data/ssd_mobilenet_v1_coco.config \ 

答案 2 :(得分:0)

糟糕的修复,但对我有用 - 只需完全删除 $variable 格式。

这是一个例子:

!gcloud ai-platform jobs submit training anurag_card_fraud \
    --scale-tier basic \
    --job-dir gs://anurag/credit_card_fraud/models/JOB_20210401_194058 \
    --master-image-uri gcr.io/anurag/xgboost_fraud_trainer:latest \
    --config trainer/hptuning_config.yaml \
    --region us-central1 \
    -- \
    --training_dataset_path=$TRAINING_DATASET_PATH \
    --validation_dataset_path=$EVAL_DATASET_PATH \
    --hptune