How to delete XCOM objects once the DAG finishes its run in Airflow

时间:2017-10-12 10:19:52

标签: airflow apache-airflow airflow-scheduler

I have a huge json file in the XCOM which later I do not need once the dag execution is finished, but I still see the Xcom Object in the UI with all the data, Is there any way to delete the XCOM programmatically once the DAG run is finished.

Thank you

5 个答案:

答案 0 :(得分:3)

你必须添加一个任务取决于你的metadatadb(sqllite,PostgreSql,MySql ..)在DAG运行完成后删除XCOM。

delete_xcom_task = PostgresOperator(
      task_id='delete-xcom-task',
      postgres_conn_id='airflow_db',
      sql="delete from xcom where dag_id=dag.dag_id and 
           task_id='your_task_id' and execution_date={{ ds }}",
      dag=dag)

您可以在运行dag之前验证查询。

数据分析 - > Ad Hoc Query - > airflow_db - >查询 - >运行!

xcom metadata

答案 1 :(得分:1)

下面是对我有用的代码,这将删除DAG中所有任务的xcom(如果仅需要删除特定任务的xcom,则将task_id添加到SQL):

由于 dag_id 是动态的,并且日期应该遵循相应的SQL语法。

from airflow.operators.postgres_operator import PostgresOperator

delete_xcom_task_inst = PostgresOperator(task_id='delete_xcom',
                                            postgres_conn_id='your_conn_id',
                                            sql="delete from xcom where dag_id= '"+dag.dag_id+"' and date(execution_date)=date('{{ ds }}')"
                                            )

答案 2 :(得分:0)

您可以通过sqlalchemy以编程方式执行清除操作,因此如果数据库结构发生更改,解决方案也不会中断:

from airflow.utils.db import provide_session
from airflow.models import XCom

@provide_session
def cleanup_xcom(session=None):
    session.query(XCom).filter(XCom.dag_id == "your dag id").delete()

您还可以清除旧的XCom数据:

from airflow.utils.db import provide_session
from airflow.models import XCom
from sqlalchemy import func

@provide_session
def cleanup_xcom(session=None):
    session.query(XCom).filter(XCom.execution_date <= func.date('2019-06-01')).delete()

如果要在完成dag后清除XCom,我认为最干净的解决方案是使用DAG模型类的“ on_success_callback”属性:

from airflow.models import DAG
from airflow.utils.db import provide_session
from airflow.models import XCom

@provide_session
def cleanup_xcom(context, session=None):
    dag_id = context["ti"]["dag_id"]
    session.query(XCom).filter(XCom.dag_id == dag_id).delete()

dag = DAG( ...
    on_success_callback=cleanup_xcom,
)

答案 3 :(得分:0)

使用

from sqlalchemy import func 
[...]
session.query(XCom).filter(XCom.execution_date <= func.date('2019-06-01')).delete()

按日期过滤(如上所述)对我不起作用。相反,我必须提供日期时间(包括时区):

from airflow.models import XCom
from datetime import datetime, timedelta, timezone

[...]

@provide_session
def cleanup_xcom(session=None):
    ts_limit = datetime.now(timezone.utc) - timedelta(days=2)
    session.query(XCom).filter(XCom.execution_date <= ts_limit).delete()
    logging.info(f"deleted all XCOMs older than {ts_limit}")

xcom_cleaner = python_operator.PythonOperator(
    task_id='delete-old-xcoms',
    python_callable=cleanup_xcom)

xcom_cleaner 

答案 4 :(得分:0)

我对这个问题的解决方案是:

from airflow.utils.db import provide_session
from airflow.models import XCom

dag = DAG(...)

@provide_session
def cleanup_xcom(**context):     
    dag = context["dag"]
    dag_id = dag._dag_id 
    session=context["session"]
    session.query(XCom).filter(XCom.dag_id == dag_id).delete()

clean_xcom = PythonOperator(
    task_id="clean_xcom",
    python_callable = cleanup_xcom,
    provide_context=True, 
    dag=dag
)

clean_xcom