我已经写了一个DAG计划,它在触发时未处理,任务未按计划运行。
我正在尝试为一些sql脚本运行计划的自动化。 我为此选择气流。 我已经确定Scheduler正在运行,db已更新并且ui指示DAG代码已更新。
@Dors-MacBook-Pro:[~/airflow/dags] $ airflow scheduler
[2019-08-14 16:00:57,011] {__init__.py:51} INFO - Using executor SequentialExecutor
____________ _____________
____ |__( )_________ __/__ /________ __
____ /| |_ /__ ___/_ /_ __ /_ __ \_ | /| / /
___ ___ | / _ / _ __/ _ / / /_/ /_ |/ |/ /
_/_/ |_/_/ /_/ /_/ /_/ \____/____/|__/
[2019-08-14 16:01:02,943] {scheduler_job.py:1288} INFO - Starting the scheduler
[2019-08-14 16:01:02,943] {scheduler_job.py:1296} INFO - Running execute loop for -1 seconds
[2019-08-14 16:01:02,943] {scheduler_job.py:1297} INFO - Processing each file at most -1 times
[2019-08-14 16:01:02,943] {scheduler_job.py:1300} INFO - Searching for files in /Users/dorlevy/airflow/dags
[2019-08-14 16:01:02,951] {scheduler_job.py:1302} INFO - There are 4 files in /Users/dorlevy/airflow/dags
[2019-08-14 16:01:02,951] {scheduler_job.py:1349} INFO - Resetting orphaned tasks for active dag runs
[2019-08-14 16:01:02,977] {dag_processing.py:543} INFO - Launched DagFileProcessorManager with pid: 86952
[2019-08-14 16:01:02,985] {settings.py:54} INFO - Configured default timezone <Timezone [UTC]>
[2019-08-14 16:01:03,001] {dag_processing.py:746} ERROR - Cannot use more than 1 thread when using sqlite. Setting parallelism to 1
尽管我尝试了很少的选择,但我怀疑流语法与预期不符。
from collections.abc import Iterable
from typing import List
from airflow import utils, DAG
from airflow.operators.python_operator import PythonOperator
from airflow.operators.dummy_operator import DummyOperator
from airflow.models import Variable
from psycopg2.extras import execute_values
import psycopg2
import os, sys
import logging
sqlpath = Variable.get('sql_path')
sql_scripts = sorted(os.listdir(sqlpath))
def etl(**kwargs):
try:
connection = psycopg2.connect(dbname=os.environ["**"],
host=os.environ["**"],
port=os.environ["**"],
user=os.environ["**"],
password=os.environ["**"])
cursor = connection.cursor()
counter = 0
if sqlFile.endswith('.sql'):
counter += 1
print('processing ' + sqlFile)
sql = sqlpath + sqlFile
queries = open(sql, encoding='utf-8').read()
sqlCommands = queries.split(';')
# loop over all sql commands in file
for command in sqlCommands:
try:
if command:
cursor.execute(command)
connection.commit()
else:
print('empty')
except (Exception, psycopg2.DatabaseError) as error:
print("Error while creating PostgreSQL table", error)
print(" {} is completed".format(sqlFile))
else:
print('no .sql files')
except (Exception, psycopg2.Error) as error:
print("Error type", error)
finally:
if connection:
cursor.close()
connection.close()
print("connection is closed")
with DAG('prod',
description='automation',
schedule_interval= None,
start_date=datetime(2019, 1, 1),
catchup=False) as dag:
pay1_0 = PythonOperator(task_id='pay1_0_1101', python_callable=etl,\
op_kwargs='1101_create_job.sql')
pay1_1 = PythonOperator(task_id='pay1_1_1102', python_callable=etl ,\
op_kwargs='1102_update.sql')
...
# Main issue bellow-
[pay1_0, trd_0, pay2_0] >>[pay1_1, trd_1, pay2_1]>> [pay1_2, trd_2, pay2_2] >>
\chain([DummyOperator(task_id='pay1_{}'.format(i), dag=dag) for i in range(3, 9)])>>
\ chain([DummyOperator(task_id='s'.format(i), dag=dag) for i in range(0, 10)])
# another option I've tried:
# chain(chain([DummyOperator(task_id='pay1_{}'.format(i), dag=dag) for #i in range(0, 9)]),\
# chain(trd_0, trd_1, trd_2),chain(pay2_0, pay2_1, pay2_2),\
# chain([DummyOperator(task_id='s'.format(i), dag=dag) for i in range(0, 10)]))
未按执行顺序处理流
答案 0 :(得分:0)