我已经在airflow中创建了任务,我计划每小时运行一次,atopsar -b 20:39:45 -e 20:42:45 -r /venki/atop_temp -S -x -a -m | awk 'BEGIN {DATE_STAMP=""; } /analysis date: /{DATE_STAMP=$4;} /^[0-9]/ {print DATE_STAMP, $0;}' > /venki/atop_mem4
设置为start_date
2016-11-16
我在default_args = {
'owner': 'airflow',
'depends_on_past': False,
'start_date': datetime(2016, 11, 16),
'email': ['airflow@airflow.com'],
'email_on_failure': False,
'email_on_retry': False,
'retries': 1,
'retry_delay': timedelta(minutes=5),
# 'queue': 'bash_queue',
# 'pool': 'backfill',
# 'priority_weight': 10,
# 'end_date': datetime(2016, 1, 1),
}
dag = DAG('test_hourly_job', default_args=default_args,schedule_interval="@hourly")
的当前时间开始气流,我可以看到Airflow正在从10:00 AM
开始运行,然后是00:00 AM
,依此类推:
01:00 AM
如何配置气流从当前时间开始说并且每小时运行一次,而不是从INFO - Executing command: airflow run test_hourly_job task1 2016-11-16T00:00:00 --local -sd DAGS_FOLDER/test_airflow.py
........
........
INFO - Executing command: airflow run test_hourly_job task1 2016-11-16T01:00:00 --local -sd DAGS_FOLDER/test_airflow.py
.......
.......
开始?
答案 0 :(得分:3)
在你的问题中你写了字典:default_args
In this there is Key: 'start_date': datetime(2016, 11, 16)
实际上这里是创建了具有输入YYYY / MM / DD格式的datetime对象,我们没有提供时间输入所以默认为00:00,所以你的脚本在00:00运行 你可以这样检查:在python中
来自datetime import datetime
datetime(2016,11,16)
#That Datetime object is generated with 00:00 Time
#datetime(2016,11,16,0,0)
#If you need Current date and time to start process you can set value as:
'start_date': datetime.now()
#if you want only current time with respective date then you can use as fallows:
current_date = datetime.now()
default_args = {
'owner': 'airflow',
'depends_on_past': False,
'start_date': datetime(2016, 11, 16, current_date.hour, current_date.minute),
'email': ['airflow@airflow.com'],
'email_on_failure': False,
'email_on_retry': False,
'retries': 1,
'retry_delay': timedelta(minutes=5),
# 'queue': 'bash_queue',
# 'pool': 'backfill',
# 'priority_weight': 10,
# 'end_date': datetime(2016, 1, 1),
}
dag = DAG('test_hourly_job', default_args=default_args,schedule_interval="@hourly")
答案 1 :(得分:0)
load_examples = False
~/airflow/airflow.cfg
airflow webserver -p <port>
~/airflow/dags
$ airflow scheduler
现在有关计划间隔,请参阅以下代码。
试试这个:
'start_date': datetime.now()
dag = DAG('tutorial', default_args=default_args, schedule_interval="* * * * *")
或
'start_date': datetime(2015, 6, 1),
dag = DAG('tutorial', default_args=default_args, schedule_interval="@hourly")
完整代码
"""
Code that goes along with the Airflow tutorial located at:
https://github.com/airbnb/airflow/blob/master/airflow/example_dags/tutorial.py
"""
from airflow import DAG
from airflow.operators.bash_operator import BashOperator
from datetime import datetime, timedelta
default_args = {
'owner': 'airflow',
'depends_on_past': False,
#'start_date': datetime(2015, 6, 1),
'start_date': datetime.now(),
'email': ['airflow@airflow.com'],
'email_on_failure': False,
'email_on_retry': False,
#'retries': 1,
#'retry_delay': timedelta(minutes=5),
# 'queue': 'bash_queue',
# 'pool': 'backfill',
# 'priority_weight': 10,
# 'end_date': datetime(2016, 1, 1),
}
dag = DAG('tutorial', default_args=default_args, schedule_interval="* * * * *") // For minute
#dag = DAG('tutorial', default_args=default_args, schedule_interval="@hourly")
#
# t1, t2 and t3 are examples of tasks created by instantiating operators
t1 = BashOperator(
task_id='print_date',
bash_command='date',
dag=dag)
t2 = BashOperator(
task_id='sleep',
bash_command='sleep 5',
retries=3,
dag=dag)
templated_command = """
{% for i in range(5) %}
echo "{{ ds }}"
echo "{{ macros.ds_add(ds, 7)}}"
echo "{{ params.my_param }}"
{% endfor %}
"""
t3 = BashOperator(
task_id='templated',
bash_command=templated_command,
params={'my_param': 'Parameter I passed in'},
dag=dag)
t2.set_upstream(t1)
t3.set_upstream(t1)
答案 2 :(得分:0)
Airflow为名为 LatestOnlyOperator 的运算符提供gem,以跳过在DAG的最近计划运行期间未运行的任务。如果现在的时间不在其execution_time和下一个计划的execution_time之间,则LatestOnlyOperator将跳过所有直接下游任务及其自身。该运算符可减少CPU周期的浪费。
default_args = {
'owner': 'airflow',
'depends_on_past': False,
'start_date': datetime(2016, 11, 16),
'email': ['airflow@airflow.com'],
'email_on_failure': False,
'email_on_retry': False,
'retries': 1,
'retry_delay': timedelta(minutes=5)
}
dag = DAG('test_hourly_job', default_args=default_args,schedule_interval="@hourly")
latest_only = LatestOnlyOperator(task_id='latest_only', dag=dag)
task1 = DummyOperator(task_id='task1', dag=dag)
latest_only >> task
Latest_only应始终位于您要跳过的任务的上游。 latest_only运算符的优点是,无论何时重启dag,它都会跳过以前所有的任务并运行当前的dag。
最好不要硬编码开始时间。取而代之的是:
from datetime import datetime, timedelta
START_DATE = datetime.combine(datetime.today() - timedelta(1), datetime.min.time())