未按预期处理DAG

时间:2019-08-14 14:09:25

标签: python airflow airflow-scheduler

我已经写了一个DAG计划,它在触发时未处理,任务未按计划运行。

我正在尝试为一些sql脚本运行计划的自动化。 我为此选择气流。 我已经确定Scheduler正在运行,db已更新并且ui指示DAG代码已更新。

@Dors-MacBook-Pro:[~/airflow/dags] $ airflow scheduler
[2019-08-14 16:00:57,011] {__init__.py:51} INFO - Using executor SequentialExecutor
  ____________       _____________
 ____    |__( )_________  __/__  /________      __
____  /| |_  /__  ___/_  /_ __  /_  __ \_ | /| / /
___  ___ |  / _  /   _  __/ _  / / /_/ /_ |/ |/ /
 _/_/  |_/_/  /_/    /_/    /_/  \____/____/|__/
[2019-08-14 16:01:02,943] {scheduler_job.py:1288} INFO - Starting the scheduler
[2019-08-14 16:01:02,943] {scheduler_job.py:1296} INFO - Running execute loop for -1 seconds
[2019-08-14 16:01:02,943] {scheduler_job.py:1297} INFO - Processing each file at most -1 times
[2019-08-14 16:01:02,943] {scheduler_job.py:1300} INFO - Searching for files in /Users/dorlevy/airflow/dags
[2019-08-14 16:01:02,951] {scheduler_job.py:1302} INFO - There are 4 files in /Users/dorlevy/airflow/dags
[2019-08-14 16:01:02,951] {scheduler_job.py:1349} INFO - Resetting orphaned tasks for active dag runs
[2019-08-14 16:01:02,977] {dag_processing.py:543} INFO - Launched DagFileProcessorManager with pid: 86952
[2019-08-14 16:01:02,985] {settings.py:54} INFO - Configured default timezone <Timezone [UTC]>
[2019-08-14 16:01:03,001] {dag_processing.py:746} ERROR - Cannot use more than 1 thread when using sqlite. Setting parallelism to 1

尽管我尝试了很少的选择,但我怀疑流语法与预期不符。

from collections.abc import Iterable
from typing import List
from airflow import utils, DAG
from airflow.operators.python_operator import PythonOperator
from airflow.operators.dummy_operator import DummyOperator
from airflow.models import Variable
from psycopg2.extras import execute_values
import psycopg2
import os, sys
import logging

sqlpath = Variable.get('sql_path')
sql_scripts = sorted(os.listdir(sqlpath))

def etl(**kwargs):
    try:
        connection = psycopg2.connect(dbname=os.environ["**"],
                                      host=os.environ["**"],
                                      port=os.environ["**"],
                                      user=os.environ["**"],
                                      password=os.environ["**"])
        cursor = connection.cursor()
        counter = 0
        if sqlFile.endswith('.sql'):
            counter += 1
            print('processing ' + sqlFile)
            sql = sqlpath + sqlFile
            queries = open(sql, encoding='utf-8').read()
            sqlCommands = queries.split(';')
            # loop over all sql commands in file
            for command in sqlCommands:
                try:
                    if command:
                        cursor.execute(command)
                        connection.commit()
                    else:
                        print('empty')
                except (Exception, psycopg2.DatabaseError) as error:
                    print("Error while creating PostgreSQL table", error)
            print(" {} is completed".format(sqlFile))
        else:
            print('no .sql files')
    except (Exception, psycopg2.Error) as error:
        print("Error type", error)
    finally:
        if connection:
            cursor.close()
            connection.close()
            print("connection is closed")


with DAG('prod',
         description='automation',
          schedule_interval= None,
           start_date=datetime(2019, 1, 1),
            catchup=False) as dag: 
pay1_0 = PythonOperator(task_id='pay1_0_1101', python_callable=etl,\
op_kwargs='1101_create_job.sql')
pay1_1 = PythonOperator(task_id='pay1_1_1102', python_callable=etl ,\
op_kwargs='1102_update.sql')
...

# Main issue bellow-

   [pay1_0, trd_0, pay2_0] >>[pay1_1, trd_1, pay2_1]>> [pay1_2, trd_2, pay2_2] >>
\chain([DummyOperator(task_id='pay1_{}'.format(i), dag=dag) for i in range(3, 9)])>>
\ chain([DummyOperator(task_id='s'.format(i), dag=dag) for i in range(0, 10)])

# another option I've tried:
# chain(chain([DummyOperator(task_id='pay1_{}'.format(i), dag=dag) for #i in range(0, 9)]),\
#  chain(trd_0, trd_1, trd_2),chain(pay2_0, pay2_1, pay2_2),\
#  chain([DummyOperator(task_id='s'.format(i), dag=dag) for i in range(0, 10)]))

未按执行顺序处理流

1 个答案:

答案 0 :(得分:0)

找到正确的语法:)

使用apache-airflow链功能

    chain([pay1_0, trd_0, pay2_0], [pay1_1, trd_1, pay2_1], [pay1_2, trd_2, pay2_2], \
        pay1_3, pay1_4, pay1_5, pay1_6, pay1_7, pay1_8, [s0, s1, s2, s3, s4, s5, s6, s7, s8, s9])

为了保持我的DagBag更新,我使用了:

python3 -c“从airflow.models导入DagBag; d = DagBag();”

以下截图中的预期结果: enter image description here