Question

操作系统：Ubuntu 18

Python：Python 3.6

MLflow：1.4

我正在尝试运行MLflow项目。这是我的项目：

MLflow
- conda.yaml
- main.py
- prep_data.py
- learn.py
- 列表项

该项目很大程度上基于此仓库：https://github.com/mlflow/mlflow/tree/master/examples/multistep_workflow 我正在尝试同时运行prep_data和使用MLflow Projects和main.py脚本作为入口点来学习脚本。为了执行，我使用以下命令：mlflow run . -P experiment_name=testproject

但是出现以下错误：

Traceback (most recent call last):
  File "prep_data.py", line 126, in <module>
    prep_data()
  File "/home/ubuntu/venv/lib/python3.6/site-packages/click/core.py", line 764, in __call__
   return self.main(*args, **kwargs)
  File "/home/ubuntu/venv/lib/python3.6/site-packages/click/core.py", line 717, in main
    rv = self.invoke(ctx)
  File "/home/ubuntu/venv/lib/python3.6/site-packages/click/core.py", line 956, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/home/ubuntu/venv/lib/python3.6/site-packages/click/core.py", line 555, in invoke
    return callback(*args, **kwargs)
  File "prep_data.py", line 65, in prep_data
    with mlflow.start_run() as active_run:
  File "/home/ubuntu/venv/lib/python3.6/site-packages/mlflow/tracking/fluent.py", line 129, in start_run
    "arguments".format(existing_run_id))
mlflow.exceptions.MlflowException: Cannot start run with ID 405b83bbb61046afa83b8dcd71b4db14 because active run ID does not match environment run ID. Make sure --experiment-name or --experiment-id matches experiment set with set_experiment(), or just use command-line arguments
Traceback (most recent call last):
  File "main.py", line 75, in <module>
    workflow()
  File "/home/ubuntu/venv/lib/python3.6/site-packages/click/core.py", line 764, in __call__
    return self.main(*args, **kwargs)
  File "/home/ubuntu/venv/lib/python3.6/site-packages/click/core.py", line 717, in main
    rv = self.invoke(ctx)
  File "/home/ubuntu/venv/lib/python3.6/site-packages/click/core.py", line 956, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/home/ubuntu/venv/lib/python3.6/site-packages/click/core.py", line 555, in invoke
    return callback(*args, **kwargs)
  File "main.py", line 61, in workflow
    }, experiment_name)
  File "main.py", line 40, in _get_or_run
    submitted_run = mlflow.run('.', entry_point=entry_point, parameters=params)
  File "/home/ubuntu/venv/lib/python3.6/site-packages/mlflow/projects/__init__.py", line 287, in run
    _wait_for(submitted_run_obj)
  File "/home/ubuntu/venv/lib/python3.6/site-packages/mlflow/projects/__init__.py", line 304, in _wait_for
    raise ExecutionException("Run (ID '%s') failed" % run_id)
mlflow.exceptions.ExecutionException: Run (ID '405b83bbb61046afa83b8dcd71b4db14') failed
2019/11/22 18:51:59 ERROR mlflow.cli: === Run (ID '62c229b2d9194b569a7b2bfc14338800') failed ===

我不确定我是否正确理解该错误，但似乎是在说我正在使用多个实验。但是我可以肯定我只使用1（testproject）。浏览SO和Github问题建议我应该设置环境变量MLFLOW_TRACKING_URI，但并未说明如何设置环境变量。因此，我尝试了两种不同的方法： 1）在运行MLflow项目之前将其导出：$ export MLFLOW_TRACKING_URI='http://127.0.0.1:5099' 2）使用python在os.environ['MLFLOW_TRACKING_URI'] = 'http://127.0.0.1:5099'的main.py脚本开头进行设置都没有任何效果。在这里您可以看到我的项目：

main.py

import os
import click
import mlflow
from mlflow.entities import RunStatus
def _already_ran(entry_point, params, experiment_name):
    # experiment = mlflow.get_experiment_by_name('{}_{}'.format(experiment_name, entry_point))
    experiment = mlflow.get_experiment_by_name(experiment_name)
    if experiment == None:
        return None
    experiment_id = experiment.experiment_id
    client = mlflow.tracking.MlflowClient()
    all_run_infos = reversed(client.list_run_infos(experiment_id))
    match_failed = False
    for run_info in all_run_infos
        full_run = client.get_run(run_info.run_id)
        for p_key, p_val in params:
            run_value = full_run.data.params.get(p_key)
            if run_value != p_val:
                match_failed = True
                break
        if match_failed:
            continue
        if run_info.to_proto().status != RunStatus.FINISHED:
            continue
        return client.get_run(run_info.run_id)
    return None


def _get_or_run(entry_point, params, experiment_name, use_cache=True):
    existing_run = _already_ran(entry_point, params, experiment_name)
    if use_cache and existing_run:
        return existing_run
    submitted_run = mlflow.run('.', entry_point=entry_point, parameters=params)
    return mlflow.tracking.MlflowClient().get_run(submitted_run.run_id)

@click.command()
@click.option("--experiment-name")
@click.option('--prep-data-time-avg', default='placeholder')
@click.option('--prep-data-sensor-id', default='placeholder')
@click.option('--learn-epochs', default=100, type=int)
@click.option('--learn-neurons', default=5, type=int)
@click.option('--learn-layers', default=2, type=int)
def workflow(experiment_name, prep_data_time_avg, prep_data_sensor_id, learn_epochs, learn_neurons, learn_layers):
    # mlflow.set_tracking_uri('http://127.0.0.1:5099')

    # mlflow.set_experiment(experiment_name)
    # with mlflow.start_run() as active_run:

    data_run = _get_or_run('prep_data', {
        'time_avg': prep_data_time_avg,
        'sensor_id':prep_data_sensor_id,
        'experiment_name': experiment_name
    }, experiment_name)

    learn_run = _get_or_run('learn', {
        'epochs': learn_epochs,
        'neurons': learn_neurons,
        'layers': learn_layers,
        'prep_data_run_id': data_run.run_id,
        'experiment_name': experiment_name,
    }, experiment_name)
if __name__ == '__main__':
    # os.environ['MLFLOW_TRACKING_URI'] = 'http://127.0.0.1:5099'
    workflow()

prep_data.py

@click.command()
@click.option("--experiment-name")
@click.option('--time-avg', default='placeholder')
@click.option('--sensor-id', default='placeholder')
def prep_data(experiment_name, time_avg, sensor_id):
    mlflow.set_experiment(experiment_name)
    with mlflow.start_run() as active_run:
      # logic code of prep_data

if __name__ == '__main__':
    prep_data()

我对解决此问题的任何想法感到高兴。

非常感谢您！

干杯，拉斐尔

Answer 1

您需要为 mlflow CLI 提供相同的实验名称：

mlflow run . -P experiment_name=testproject --experiment-name testproject

欲知更多详情： https://www.mlflow.org/docs/latest/cli.html#mlflow-run

MLflow：活动运行标识与环境运行标识不匹配

1 个答案: