指定backend-store-uri后,mlflow跟踪服务器无法启动

时间:2019-09-24 08:38:35

标签: mlflow

我按以下方式运行mlflow:

Dockerfile包含以下CMD命令

CMD mlflow server \
    --host 0.0.0.0 \
    --backend-store-uri "${BACKEND_STORE_URI}" \
    --default-artifact-root "${DEFAULT_ARTIFACT_ROOT}"

docker run --rm --name mlflow -p 5000:5000 -e BACKEND_STORE_URI=mssql+pymssql://user:pass@mybackendstoreuri/mlflow mlflow之后的

它显示

INFO  [alembic.runtime.migration] Context impl MSSQLImpl.
INFO  [alembic.runtime.migration] Will assume transactional DDL.
INFO  [alembic.runtime.migration] Context impl MSSQLImpl.
INFO  [alembic.runtime.migration] Will assume transactional DDL.

,但是然后容器退出而不启动服务器。

没有指定backend store uri,我可以看到与绑定到主机有关的日志,并且该容器不存在

如何运行mlflow跟踪服务器并使用后端存储uri?

1 个答案:

答案 0 :(得分:0)

根本原因是

MLflow UI and client code expects a default experiment with ID 0.
This method uses SQL insert statement to create the default experiment as a hack, since
experiment table uses 'experiment_id' column is a PK and is also set to auto increment.
MySQL and other implementation do not allow value '0' for such cases.

ref:https://github.com/mlflow/mlflow/blob/v1.2.0/mlflow/store/sqlalchemy_store.py#L171

在迁移过程中不会引发任何错误,因此不会显示任何错误,并且当静默失败时,Alembic版本是最新的。 参考:https://github.com/mlflow/mlflow/blob/v1.2.0/mlflow/store/db_migrations/env.py#L71

如果使用与MySQL测试(https://github.com/mlflow/mlflow/blob/v1.2.0/mlflow/store/sqlalchemy_store.py#L171)相同的思想,则会引发异常-Cannot insert explicit value for identity column in table 'experiment' when IDENTITY_INSERT is set to OFF.

测试代码段:

class TestSqlAlchemyStoreMssqlDb(unittest.TestCase):
    """
    Run tests against a MSSQL database
    """
    def setUp(self):
        db_username = "test"
        db_password = "test"
        host = "test"
        db_name = "TEST_DB"

        db_server_url = "mssql+pymssql://%s:%s@%s" % (db_username, db_password, host)
        self._engine = sqlalchemy.create_engine(db_server_url)

        self._db_url = "%s/%s" % (db_server_url, db_name)
        print("Connect to %s" % self._db_url)

    def test_store(self):
        self.store = SqlAlchemyStore(db_uri=self._db_url, default_artifact_root=ARTIFACT_URI)

使用postgres服务器完成迁移,如日志所示。

mlflow_1    | 2019/09/24 09:03:55 INFO mlflow.store.sqlalchemy_store: Creating initial MLflow database tables...
mlflow_1    | 2019/09/24 09:03:55 INFO mlflow.store.db.utils: Updating database tables at postgresql://postgres:postgres@postgres:5432/postgres
mlflow_1    | INFO  [alembic.runtime.migration] Context impl PostgresqlImpl.
mlflow_1    | INFO  [alembic.runtime.migration] Will assume transactional DDL.
mlflow_1    | INFO  [alembic.runtime.migration] Running upgrade  -> 451aebb31d03, add metric step
mlflow_1    | INFO  [alembic.runtime.migration] Running upgrade 451aebb31d03 -> 90e64c465722, migrate user column to tags
mlflow_1    | INFO  [alembic.runtime.migration] Running upgrade 90e64c465722 -> 181f10493468, allow nulls for metric values
mlflow_1    | INFO  [alembic.runtime.migration] Running upgrade 181f10493468 -> df50e92ffc5e, Add Experiment Tags Table
mlflow_1    | INFO  [alembic.runtime.migration] Running upgrade df50e92ffc5e -> 7ac759974ad8, Update run tags with larger limit
mlflow_1    | INFO  [alembic.runtime.migration] Context impl PostgresqlImpl.
mlflow_1    | INFO  [alembic.runtime.migration] Will assume transactional DDL.
mlflow_1    | [2019-09-24 09:03:55 +0000] [15] [INFO] Starting gunicorn 19.9.0
mlflow_1    | [2019-09-24 09:03:55 +0000] [15] [INFO] Listening at: http://0.0.0.0:5000 (15)
mlflow_1    | [2019-09-24 09:03:55 +0000] [15] [INFO] Using worker: sync
mlflow_1    | [2019-09-24 09:03:55 +0000] [18] [INFO] Booting worker with pid: 18
mlflow_1    | [2019-09-24 09:03:56 +0000] [22] [INFO] Booting worker with pid: 22
mlflow_1    | [2019-09-24 09:03:56 +0000] [26] [INFO] Booting worker with pid: 26
mlflow_1    | [2019-09-24 09:03:56 +0000] [27] [INFO] Booting worker with pid: 27