我正在使用下面的.yaml文件在Kubeflow中创建Katib实验。但是,我得到了
无法协调:无法从以下字符串恢复结构
:
错误。有什么解决办法吗?大部分Katib实验示例代码中没有卷,但是我试图从S3下载数据后挂载卷。
apiVersion: "kubeflow.org/v1alpha3"
kind: Experiment
metadata:
namespace: apple
labels:
controller-tools.k8s.io: "1.0"
name: transformer-experiment
spec:
objective:
type: maximize
goal: 0.8
objectiveMetricName: Train-accuracy
additionalMetricNames:
- Train-loss
algorithm:
algorithmName: random
parallelTrialCount: 3
maxTrialCount: 12
maxFailedTrialCount: 3
metricsCollectorSpec:
collector:
kind: StdOut
parameters:
- name: --lr
parameterType: double
feasibleSpace:
min: "0.01"
max: "0.03"
- name: --dropout_rate
parameterType: double
feasibleSpace:
min: "0.005"
max: "0.020"
- name: --layer_count
parameterType: int
feasibleSpace:
min: "2"
max: "5"
- name: --d_model_count
parameterType: categorical
feasibleSpace:
list:
- "64"
- "128"
- "256"
trialTemplate:
goTemplate:
rawTemplate: |-
apiVersion: batch/v1
kind: Job
metadata:
name: {{.Trial}}
namespace: {{.NameSpace}}
spec:
template:
spec:
volumes:
- name: train-data
emptyDir: {}
containers:
- name: data-download
image: amazon/aws-cli
command:
- "aws s3 sync s3://kubeflow/kubeflowdata.tar.gz /train-data"
volumeMounts:
- name: train-data
mountPath: /train-data
- name: {{.Trial}}
image: <Our Image>
command:
- "cd /train-data"
- "ls"
- "python"
- "/opt/ml/src/main.py"
- "--train_batch=64"
- "--test_batch=64"
- "--num_workers=4"
volumeMounts:
- name: train-data
mountPath: /train-data
{{- with .HyperParameters}}
{{- range .}}
- "{{.Name}}={{.Value}}"
{{- end}}
{{- end}}
restartPolicy: Never