自动生成spark_default连接ID的方法是什么

时间:2019-05-23 01:13:19

标签: airflow

我想为 spark_default 生成 conn_id 。我在airflow上运行k8s,我想使用 spark master(它是在相同名称空间中运行的另一个容器)动态生成conn_id。 / p>

有没有一种动态生成conn_id的方式:

  • env变量
  • 或使用SparkSubmitOperator本身编写并生成conn_id

这是我的验证码:

from airflow import DAG

from airflow.contrib.operators.spark_submit_operator import SparkSubmitOperator
from datetime import datetime, timedelta


args = {
    'owner': 'airflow',
    'start_date': datetime(2019, 5, 22)
}
dag = DAG('spark_example_new', default_args=args, schedule_interval="*/10 * * * *")

operator = SparkSubmitOperator(
    task_id='spark_submit_job_from_airflow',
    conn_id='spark_default',
    java_class='org.apache.spark.examples.JavaWordCount',
    application='local:///opt/spark/examples/jars/spark-examples_2.12-2.4.1.jar',
    total_executor_cores='1',
    executor_cores='2',
    executor_memory='2g',
    num_executors='1',
    name='airflow-spark-example-coming-from-aws-k8s',
    verbose=True,
    driver_memory='1g',
    application_args=["/opt/spark/data/graphx/users.txt"],
    dag=dag,
)

1 个答案:

答案 0 :(得分:1)

您可以尝试查看this answer

from airflow.models import Connection
from airflow import settings

def create_conn(username, password, host=None):
    new_conn = Connection(conn_id=f'{username}_connection',
                                  login=username,
                                  host=host if host else None)
    new_conn.set_password(password)

    session = settings.Session()
    session.add(new_conn)
    session.commit()