气流:将动态值传递给Sub DAG操作员

时间:2017-06-05 09:25:10

标签: python airflow apache-airflow

我是Airflow的新手 我遇到过一个场景,其中Parent DAG需要将一些动态数字(比方说 If ((DataGridView1.Columns.Count = 0) Or (DataGridView1.Rows.Count = 0)) Then Exit Sub End If Dim dset As New DataSet dset.Tables.Add() For i As Integer = 0 To DataGridView1.ColumnCount - 1 dset.Tables(0).Columns.Add(DataGridView1.Columns(i).HeaderText) Next Dim dr1 As DataRow For i As Integer = 0 To DataGridView1.RowCount - 1 dr1 = dset.Tables(0).NewRow For j As Integer = 0 To DataGridView1.Columns.Count - 1 dr1(j) = DataGridView1.Rows(i).Cells(j).Value Next dset.Tables(0).Rows.Add(dr1) Next Dim excel As New Microsoft.Office.Interop.Excel.Application Dim wBook As Microsoft.Office.Interop.Excel.Workbook Dim wSheet As Microsoft.Office.Interop.Excel.Worksheet wBook = excel.Workbooks.Add() wSheet = wBook.ActiveSheet() Dim dt As System.Data.DataTable = dset.Tables(0) Dim dc As System.Data.DataColumn Dim dr As System.Data.DataRow Dim colIndex As Integer = 0 Dim rowIndex As Integer = 0 For Each dc In dt.Columns colIndex = colIndex + 1 excel.Cells(1, colIndex) = dc.ColumnName Next For Each dr In dt.Rows rowIndex = rowIndex + 1 colIndex = 0 For Each dc In dt.Columns colIndex = colIndex + 1 excel.Cells(rowIndex + 1, colIndex) = dr(dc.ColumnName) Next Next wSheet.Columns.AutoFit() Dim strFileName As String = "D:\testehorario.xlsx" Dim blnFileOpen As Boolean = False Try Dim fileTemp As System.IO.FileStream = System.IO.File.OpenWrite(strFileName) fileTemp.Close() Catch ex As Exception blnFileOpen = False End Try If System.IO.File.Exists(strFileName) Then System.IO.File.Delete(strFileName) End If wBook.SaveAs(strFileName) excel.Workbooks.Open(strFileName) excel.Visible = True )传递给Sub DAG。
SubDAG将使用此数字动态创建Dim sfd As New SaveFileDialog() ' this creates an instance of the SaveFileDialog called "sfd" sfd.Filter = "txt files (*.xlsx)|*.xlsx|All files (*.*)|*.*" sfd.FilterIndex = 1 sfd.RestoreDirectory = True If sfd.ShowDialog() = DialogResult.OK Then Dim FileName As String = sfd.FileName ' retrieve the full path to the file selected by the user Dim sw As New System.IO.StreamWriter(FileName, False) ' create a StreamWriter with the FileName selected by the User sw.WriteLine(TextBox1.Text) ' Write the contents of TextBox to the file sw.Close() ' close the file End If 并行任务。

Airflow文档未涵盖实现此目的的方法。所以我有几种方法探讨:

选项 - 1(使用xcom Pull)

我试图传递为xcom值,但由于某种原因,SubDAG没有解析为传递的值。

家长Dag档案

n

Sub Dag档案

n

选项 - 2

我还尝试将def load_dag(**kwargs): number_of_runs = json.dumps(kwargs['dag_run'].conf['number_of_runs']) dag_data = json.dumps({ "number_of_runs": number_of_runs }) return dag_data # ------------------ Tasks ------------------------------ load_config = PythonOperator( task_id='load_config', provide_context=True, python_callable=load_dag, dag=dag) t1 = SubDagOperator( task_id=CHILD_DAG_NAME, subdag=sub_dag(PARENT_DAG_NAME, CHILD_DAG_NAME, default_args, "'{{ ti.xcom_pull(task_ids='load_config') }}'" ), default_args=default_args, dag=dag, ) 作为全局变量传递,但这不起作用。

选项 - 3

我们也尝试将此值写入数据文件。但是子DAG正在抛出def sub_dag(parent_dag_name, child_dag_name, args, num_of_runs): dag_subdag = DAG( dag_id='%s.%s' % (parent_dag_name, child_dag_name), default_args=args, schedule_interval=None) variabe_names = {} for i in range(num_of_runs): variabe_names['task' + str(i + 1)] = DummyOperator( task_id='dummy_task', dag=dag_subdag, ) return dag_subdag 。这可能是因为我们正在动态生成此文件。

有人可以帮助我。

4 个答案:

答案 0 :(得分:2)

我已经使用选项3完成了它。关键是如果文件不存在则返回没有任务的有效dag。因此,如果需要,load_config将生成包含您的任务数量或更多信息的文件。您的子工厂看起来像:

def subdag(...):
    sdag = DAG('%s.%s' % (parent, child), default_args=args, schedule_interval=timedelta(hours=1))
    file_path = "/path/to/generated/file"
    if os.path.exists(file_path):
        data_file = open(file_path)
        list_tasks = data_file.readlines()
        for task in list_tasks:
            DummyOperator(
                  task_id='task_'+task,
                  default_args=args,
                  dag=sdag,
            )
    return sdag

在dag生成中,您将看到一个没有任务的子标记。在执行dag时,在load_config完成后,您可以看到动态生成的子标记

答案 1 :(得分:1)

如果仅将呼叫更改为xcom_pull以包括父dag的dag_id,则选项1应该起作用。默认情况下,xcom_pull调用将在其自身不存在的dag中查找task_id 'load_config'

因此将x_com调用宏更改为:

subdag=sub_dag(PARENT_DAG_NAME, CHILD_DAG_NAME, default_args, "'{{ ti.xcom_pull(task_ids='load_config', dag_id='" + PARENT_DAG_NAME + "' }}'" ),

答案 2 :(得分:0)

如果您要写入的文件名不是动态文件(例如,您正在为每个任务实例一遍又一遍地重写同一文件),Jaime的答案将起作用:

file_path = "/path/to/generated/file"

但是,如果您需要唯一的文件名或希望每个任务实例将不同的内容写入并行执行的任务的文件中,则在这种情况下,气流将无法正常工作,因为无法将执行日期或变量传递到外部模板。看一下this post

答案 3 :(得分:0)

看看我的答案here,其中我描述了一种基于先前使用xcoms和subdags执行的任务的结果动态创建任务的方法。