我正在使用AirFlow
来调度作业,但是它变得比以前慢,尤其是对于task_stat
中的views.py
方法,我有400多个dags,其中有300万行表格1}}。
我必须等待40秒钟以上才能获得task_instance
的响应,有什么方法可以优化此方法?
task_stat
和union_all()
中的RunningTI
是最慢的一个,如果删除LastTI
并仅在合并结果时保留RunningTI
,我可以在5秒钟内得到响应,但是LastTI
对于前端显示详细信息是必需的。
是否可以优化此查询?该数据库是MySQL。
task_stat方法:
RunningTI
相关型号:
@expose('/task_stats')
@login_required
@provide_session
def task_stats(self, session=None):
TI = models.TaskInstance
DagRun = models.DagRun
Dag = models.DagModel
LastDagRun = (
session.query(DagRun.dag_id, sqla.func.max(DagRun.execution_date).label('execution_date'))
.join(Dag, Dag.dag_id == DagRun.dag_id)
.filter(DagRun.state != State.RUNNING)
.filter(Dag.is_active == True) # noqa: E712
.filter(Dag.is_subdag == False) # noqa: E712
.group_by(DagRun.dag_id)
.subquery('last_dag_run')
)
RunningDagRun = (
session.query(DagRun.dag_id, DagRun.execution_date)
.join(Dag, Dag.dag_id == DagRun.dag_id)
.filter(DagRun.state == State.RUNNING)
.filter(Dag.is_active == True) # noqa: E712
.filter(Dag.is_subdag == False) # noqa: E712
.subquery('running_dag_run')
)
# Select all task_instances from active dag_runs.
# If no dag_run is active, return task instances from most recent dag_run.
LastTI = (
session.query(TI.dag_id.label('dag_id'), TI.state.label('state'))
.join(LastDagRun, and_(
LastDagRun.c.dag_id == TI.dag_id,
LastDagRun.c.execution_date == TI.execution_date))
)
RunningTI = (
session.query(TI.dag_id.label('dag_id'), TI.state.label('state'))
.join(RunningDagRun, and_(
RunningDagRun.c.dag_id == TI.dag_id,
RunningDagRun.c.execution_date == TI.execution_date))
)
UnionTI = union_all(LastTI, RunningTI).alias('union_ti')
# if I remove RunningTi in union_all(), and change line below to
# UnionTI = union_all(LastTI).alias('union_ti'), it could save a lot of time
qry = (
session.query(UnionTI.c.dag_id, UnionTI.c.state, sqla.func.count())
.group_by(UnionTI.c.dag_id, UnionTI.c.state)
)
data = {}
for dag_id, state, count in qry:
if dag_id not in data:
data[dag_id] = {}
data[dag_id][state] = count
session.commit()
指向github的链接:https://github.com/apache/airflow/blob/master/airflow/www/views.py