气流-在文件中预定义变量和连接

时间:2019-10-04 08:00:49

标签: airflow

是否可以在文件中预定义变量,连接等,以便在Airflow启动时加载它们?从部署的角度来看,通过UI设置它们不是很好。

欢呼

特里

1 个答案:

答案 0 :(得分:0)

我很高兴有人问这个问题。实际上,由于Airflow对最终用户完全暴露了底层SQLAlchemy models,因此,所有Airflow模型(尤其是用于{{3}的用来提供配置的)那些模型的编程操作(创建,更新和删除) }和Connection

这可能不是很明显,但是Airflow的开源特性意味着没有秘密:您只需要加倍努力就可以窥见。特别是对于这些用例,我一直发现Variable是非常有用的参考点。


因此,这是我在设置Airflow时创建所有cli.py的代码段。提供的输入文件具有给定结构的JSON格式。

# all imports
import json
from typing import List, Dict, Any, Optional

from airflow.models import Connection
from airflow.settings import Session
from airflow.utils.db import provide_session
from sqlalchemy.orm import exc

# trigger method
def create_mysql_conns(file_path: str) -> None:
    """
    Reads MySQL connection settings from a given JSON file and
    persists it in Airflow's meta-db. If connection for same
    db already exists, it is overwritten

    :param file_path: Path to JSON file containing MySQL connection settings
    :type file_path:  str
    :return:          None
    :type:            None
    """
    with open(file_path) as json_file:
        json_data: List[Dict[str, Any]] = json.load(json_file)
        for settings_dict in json_data:
            db_name: str = settings_dict["db"]
            conn_id: str = "mysql.{db_name}".format(db_name=db_name)
            mysql_conn: Connection = Connection(conn_id=conn_id,
                                                conn_type="mysql",
                                                host=settings_dict["host"],
                                                login=settings_dict["user"],
                                                password=settings_dict["password"],
                                                schema=db_name,
                                                port=settings_dict.get("port", mysql_conn_description["port"]))
            create_and_overwrite_conn(conn=mysql_conn)


# utility delete method
@provide_session
def delete_conn_if_exists(conn_id: str, session: Optional[Session] = None) -> bool:
    # Code snippet borrowed from airflow.bin.cli.connections(..)
    try:
        to_delete: Connection = (session
                                 .query(Connection)
                                 .filter(Connection.conn_id == conn_id)
                                 .one())
    except exc.NoResultFound:
        return False
    except exc.MultipleResultsFound:
        return False
    else:
        session.delete(to_delete)
        session.commit()
        return True


# utility overwrite method
@provide_session
def create_and_overwrite_conn(conn: Connection, session: Optional[Session] = None) -> None:
    delete_conn_if_exists(conn_id=conn.conn_id)
    session.add(conn)
    session.commit()

输入JSON文件结构

[
    {
        "db": "db_1",
        "host": "db_1.hostname.com",
        "user": "db_1_user",
        "password": "db_1_passwd"
    },
    {
        "db": "db_2",
        "host": "db_2.hostname.com",
        "user": "db_2_user",
        "password": "db_2_passwd"
    }
]

参考链接