我正在使用Airflow的HdfsSensor检测hdfs目录。我们已经集群化了。我的代码不断戳目录,未按如下所示检测
[2020-08-25 13:57:19,808] {hdfs_sensor.py:100} INFO - Poking for file /tmp/ayush/hive/sensor/event_date=2020-08-25
[2020-08-25 13:58:19,871] {hdfs_sensor.py:100} INFO - Poking for file /tmp/ayush/hive/sensor/event_date=2020-08-25
这是我的代码
from airflow import DAG
from datetime import datetime, timedelta
from airflow.operators.dummy_operator import DummyOperator
from airflow.operators.python_operator import PythonOperator, BranchPythonOperator
from airflow.operators.hive_operator import HiveOperator
from airflow.operators.email_operator import EmailOperator
from airflow.sensors.hdfs_sensor import HdfsSensor
from airflow.operators.bash_operator import BashOperator
DAG_ID = 'Sensor_Test'
args = {
'owner': 'Airflow',
'start_date': datetime(year=2020, month=8, day=20)
}
dag = DAG(dag_id=DAG_ID,
default_args=args,
schedule_interval='30 6 * * *',
catchup=False)
source_data_sensor = HdfsSensor(
task_id='source_data_sensor',
filepath='/tmp/ayush/hive/sensor/event_date={{ ds }}',
dag=dag
)
dag >> source_data_sensor
这是毛笔或其他东西的问题
在hdfs_conn_id中,我在连接中使用默认的hdfs_default
我还可以使用连接中提供的主机名来查看目录
答案 0 :(得分:0)
在 HDFSsensor 任务中提供 HDFS 连接 ID
hdfs_sense_open = HdfsSensorImp(
task_id='hdfs_sense_open',
filepath='/user/xxxx/hosts',
hdfs_conn_id='hdfs_folder', ##add this
dag=dag)