Question

我正在尝试提交Google Cloud作业，以训练mnist位数的cnn模型。由于我是gcp的新手，所以我想首先在f1-micro机器上进行此练习的练习。但没有成功。一路上我有两个问题。

这是我的系统。 Windows 10，Anaconda，Jupyter Notebook 6，Python 3.6，TF 1.13.0。起初，我的模型在不涉及任何gcp命令的情况下运作良好。然后我按照gcp课程的建议将文件打包到一个模块中。并在本地火车上使用gcloud命令。在关闭并暂停ipynb文件之前，该单元似乎卡住了，什么也不做。培训紧随其后开始，结果是正确的，因为我在Tensorboard上对其进行了监视。我需要怎么做才能使其在不关闭笔记本的情况下从单元正常运行？顺便说一句，我可以使它在终端中运行，但不会出现此问题。

第二个问题，然后我尝试向谷歌云机提交文件。我创建了一个带有f1-micro的vm实例以进行练习，因为它有很多空闲时间。但我的命令选项不被接受。我尝试了几种机器类型的格式。我无法正确设置机器类型。以及如何建立与已创建实例的连接？

有什么建议吗？谢谢！代码在这里。

#1.local submission lines


OUTDIR='trained_test'

INPDIR='..\data'
shutil.rmtree(path = OUTDIR, ignore_errors = True) 

!gcloud ai-platform local train \
    --module-name=trainer.task \
    --package-path=trainer \
    -- \
    --output_dir=$OUTDIR \
    --input_dir=$INPDIR \
    --epochs=2 \
    --learning_rate=0.001 \
    --batch_size=100


#2. submit to compute engine

OUTDIR='gs://'+BUCKET+'/digit/train_01'
INPDIR='gs://'+BUCKET+'/digit/data'
JOBNAME='kaggle_digit_01_'+datetime.now().strftime("%Y%m%d_%H%M%S")

!gcloud ai-platform jobs submit training $JOBNAME \
    --region=$REGION \
    --module-name=trainer.task \
    --package-path=trainer \
    --job-dir=$OUTDIR \
    --staging-bucket=gs://$BUCKET \
    --scale-tier=custom \
    --master-machine-type=zones/us-central1-a/machineTypes/f1-micro \
    --runtime-version 1.13 \
    -- \
    --output_dir=OUTDIR \
    --input_dir=INPDIR \
    --epochs=5 --learning_rate=0.001 --batch_size=100 \

错误消息：

ERROR: (gcloud.ai-platform.jobs.submit.training) INVALID_ARGUMENT: Field: master_type Error: The specified machine type is not supported: zones/us-central1-a/machineTypes/f1-micro
- '@type': type.googleapis.com/google.rpc.BadRequest
  fieldViolations:
  - description: 'The specified machine type is not supported: zones/us-central1-a/machineTypes/f1-micro'
    field: master_type

更新：

更改机器类型确实可行

--scale-tier=CUSTOM \
--master-machine-type=n1-standard-4 \

我也将以下内容放在开头，因此笔记本可以识别文件格式，例如$ jobname ...

import gcsfs

btw --job-dir似乎无关紧要。

但是，本地火车仍然存在相同的问题，我需要关闭并停止笔记本电脑才能开始训练。有人可以对此提出建议吗？

Answer 1

AI平台培训不支持

f1-micro。 Here是受支持的计算机的列表。另外，您无需指定区域。只是机器类型。即--master-machine-type = n1-standard-4

无法在Jupyter Notebook中使用gcloud ml-engine（或ai-platform）命令将作业提交给f1-micro

1 个答案: