气流HdfsSensor未检测到

时间:2020-08-25 08:36:28

标签: airflow airflow-scheduler

我正在使用Airflow的HdfsSensor检测hdfs目录。我们已经集群化了。我的代码不断戳目录,未按如下所示检测

[2020-08-25 13:57:19,808] {hdfs_sensor.py:100} INFO - Poking for file /tmp/ayush/hive/sensor/event_date=2020-08-25
[2020-08-25 13:58:19,871] {hdfs_sensor.py:100} INFO - Poking for file /tmp/ayush/hive/sensor/event_date=2020-08-25

这是我的代码

from airflow import DAG
from datetime import datetime, timedelta
from airflow.operators.dummy_operator import DummyOperator
from airflow.operators.python_operator import PythonOperator, BranchPythonOperator
from airflow.operators.hive_operator import HiveOperator
from airflow.operators.email_operator import EmailOperator
from airflow.sensors.hdfs_sensor import HdfsSensor
from airflow.operators.bash_operator import BashOperator

DAG_ID = 'Sensor_Test'

args = {
    'owner': 'Airflow',
    'start_date': datetime(year=2020, month=8, day=20)
}

dag = DAG(dag_id=DAG_ID,
          default_args=args,
          schedule_interval='30 6 * * *',
          catchup=False)



source_data_sensor = HdfsSensor(
    task_id='source_data_sensor',
    filepath='/tmp/ayush/hive/sensor/event_date={{ ds }}',
    dag=dag
)


dag  >> source_data_sensor

这是毛笔或其他东西的问题

在hdfs_conn_id中,我在连接中使用默认的hdfs_default

我还可以使用连接中提供的主机名来查看目录

1 个答案:

答案 0 :(得分:0)

在 HDFSsensor 任务中提供 HDFS 连接 ID

hdfs_sense_open = HdfsSensorImp(
        task_id='hdfs_sense_open',
        filepath='/user/xxxx/hosts',
        hdfs_conn_id='hdfs_folder', ##add this 
        dag=dag)