气流随机访问OS Env变量,DAG不断失败,使用Systemd进行设置

时间:2019-02-21 16:57:05

标签: airflow systemd

我正在尝试设置Airflow来管理我们的ETL流程,使用Amazon Linux 2 AMI启动EC2实例,使一个名为airflow的用户,将我的代码移至/home/airflow/airflow(所以{ {1}},依此类推,然后使用~/airflow/dags进行如下设置:

(已删除凭据和敏感信息)

气流环境文件:

systemd

/etc/sysconfig/airflow

Airflow Systemd服务配置文件:

SCHEDULER_RUNS=5 #Airflow specific settings AIRFLOW_HOME="/home/airflow/airflow/" AIRFLOW_CONN_REDSHIFT_CONNECTION="" AIRFLOW_CONN_S3_CONNECTION="" AIRFLOW_CONN_S3_LOGS_CONNECTION="" AIRFLOW__CORE__FERNET_KEY="" 中:

/usr/lib/systemd/system/

(从airflow-scheduler.service符号链接

/home/airflow/.airflow_config/

-rw-r--r-- 1 root    root    1.3K Feb 21 16:18 airflow-scheduler.service

[Unit] Description=Airflow scheduler daemon After=network.target postgresql.service mysql.service redis.service rabbitmq-server.service Wants=postgresql.service mysql.service redis.service rabbitmq-server.service [Service] EnvironmentFile=/etc/sysconfig/airflow User=airflow Group=airflow Type=simple ExecStart=/usr/bin/bash -c ' source /home/airflow/.env/bin/activate ; source /home/airflow/.bashrc ; airflow scheduler' Restart=always RestartSec=5s [Install] WantedBy=multi-user.target

(从airflow-webserver.service符号链接

/home/airflow/.airflow_config/

-rw-r--r-- 1 root    root    1.4K Feb 20 14:38 airflow-webserver.service

气流用户.bashrc文件:

[Unit] Description=Airflow webserver daemon After=network.target postgresql.service mysql.service redis.service rabbitmq-server.service Wants=postgresql.service mysql.service redis.service rabbitmq-server.service [Service] EnvironmentFile=/etc/sysconfig/airflow User=airflow Group=airflow Type=simple ExecStart=/usr/bin/bash -c 'source /home/airflow/.env/bin/activate ; source /home/airflow/.bashrc ; airflow webserver -p 8080 --pid /run/airflow/webserver.pid' Restart=on-failure RestartSec=5s PrivateTmp=true [Install] WantedBy=multi-user.target

/home/airflow/.bashrc

气流配置文件:

# .bashrc # Source global definitions if [ -f /etc/bashrc ]; then . /etc/bashrc fi # User specific aliases and functions # Aliases alias python="python3" alias pip="pip3" alias airflow_venv="source $HOME/.env/bin/activate" #Airflow specific settings export AIRFLOW_HOME="/home/airflow/airflow/" export AIRFLOW_CONN_REDSHIFT_CONNECTION="" export AIRFLOW_CONN_S3_CONNECTION="" export AIRFLOW_CONN_S3_LOGS_CONNECTION="" export AIRFLOW__CORE__FERNET_KEY="" # Credentials export EXTERNAL_SERVICE_CREDENTIAL="" export EXTERNAL_SERVICE_PASSWORD="" : (从airflow.cfg链接到

/home/airflow/.airflow_config/

-rw-r--r-- 1 airflow airflow 5.3K Feb 12 17:45 airflow.cfg

DAG默认参数:

[core]
airflow_home = /home/airflow/airflow
dags_folder = /home/airflow/airflow/dags
base_log_folder = /home/airflow/airflow/logs
plugins_folder = /home/airflow/airflow/plugins
sql_alchemy_conn = 
child_process_log_directory = /home/airflow/airflow/logs/scheduler

executor = LocalExecutor

remote_logging = True
remote_log_conn_id = s3_logs_connection
remote_base_log_folder = s3://my-bucket-here
encrypt_s3_logs = False

现在,我遇到的问题是,在具有几个任务的DAG中,每个任务都使用不同的os环境变量,即凭证(仅在default_args = { 'owner': 'airflow', 'depends_on_past': True, 'retry_on_failure': True, 'task_concurrency': 1, 'start_date': datetime(2019, 2, 19), 'max_active_runs': 1} dag_name_here = DAG( "dag_name_here", default_args=default_args, schedule_interval=timedelta(days=1)) 中定义)或连接(在{{ 1}}和.bashrc),有时第一个任务总是失败,有时是第二个任务,有时是第三个任务,依此类推,这意味着对于回填,我可能会看到第三个任务并行运行3个DagRun DAG中的任务正常运行,第二个任务未能获取环境变量。

例如,一个任务可能是/etc/sysconfig/airflow,它可能成功,然后下一个.bashrc可能返回一个CreateStagingRedshiftTable错误,即使它们使用相同的错误。

我尝试使用envfile中的no,单引号和双引号,使用PopulateStagingTable导出和不导出.bashrc和.bash_profile中的var,并且我不断运行Connection does not exist

任何想法或帮助都将不胜感激。

0 个答案:

没有答案