使用--backend kubernetes运行时MLflow无法拾取环境变量

时间:2020-09-22 09:38:02

标签: mlflow

我正在尝试使用下面的命令在https://github.com/mlflow/mlflow/tree/master/examples/docker之后,在Kubernetes中将MLProject放入run

MLFLOW_TRACKING_URI=https://xxxx.xxxx.com MLFLOW_TRACKING_USERNAME=xxxxx MLFLOW_TRACKING_PASSWORD=xxxxxxxx mlflow run -P alpha=0.5  . --backend kubernetes --backend-config kubernetes_config.json --experiment-name test1

describe pod只能显示 MLFLOW_TRACKING_URI ,而不能提取其他两个env变量,请参见下文

Name:         tutorial-2020-09-22-11-17-54-381954-mgptt
Namespace:    mlflow
Priority:     0
Node:         kind-worker/172.19.0.3
Containers:
  tutorial:
    Container ID:  containerd://7de7124f96ea83da474e66bc7b2119cba4d4e0ab7188e135e3ed190dde5c8df4
    State:          Terminated
      Reason:       Error
      Exit Code:    1
      Started:      Tue, 22 Sep 2020 11:17:55 +0200
      Finished:     Tue, 22 Sep 2020 11:17:56 +0200
    Ready:          False
    Restart Count:  0
    ...
    Environment:
      MLFLOW_RUN_ID:         1a83aa6a16704883a775ad50bc94c7e9
      MLFLOW_TRACKING_URI:   https://xxxx.xxxxx.com
      MLFLOW_EXPERIMENT_ID:  73
    ...

下面在MLproject文件中描述了env变量,注意下面提到的卷也未按预期装入

MLproject

docker_env:
  image: somaupday/mlflow-sklearn:latest
  environment: [["MLFLOW_TRACKING_URI", "https://xxxx.xxxxx.com"], ["MLFLOW_TRACKING_USERNAME", "mlflow"], ["MLFLOW_TRACKING_PASSWORD", "xxxxxxx"]]
  volumes: ["${HOME}/.aws:/root/.aws"]

其他相关文件供参考 kubernetes_config.json

{
  "kube-context": "kind-kind",
  "repository-uri": "xxxxx/mlflow-sklearn",
  "kube-job-template-path": "./kubernetes_job_template.yaml"
}

kubernetes_job_template.yaml

apiVersion: batch/v1
kind: Job
metadata:
  name: "{replaced with MLflow Project name}"
  namespace: mlflow
spec:
  ttlSecondsAfterFinished: 100
  backoffLimit: 0
  template:
    spec:
      containers:
      - name: "{replaced with MLflow Project name}"
        image: "{replaced with URI of Docker image created during Project execution}"
        command: ["{replaced with MLflow Project entry point command}"]
      resources:
        limits:
          memory: 512Mi
        requests:
          memory: 256Mi
      restartPolicy: Never

当我向kubernetes_job_template.yaml文件中添加env: ["{appended with MLFLOW_TRACKING_URI, MLFLOW_RUN_ID and MLFLOW_EXPERIMENT_ID}"]时,我也遇到了问题,因为此模板值未替换为实际值,并且最终创建了无效的Kubernetes Job清单。

0 个答案:

没有答案