气流弹性日志配置

时间:2019-02-19 15:43:15

标签: elasticsearch airflow

在Apache Airflow中设置弹性日志记录时遇到一些问题。 从版本1.10开始,弹性日志记录已添加到配置中。

当查看airflow.cfg文件时,我们有两个与Elastic相关的部分:

# Airflow can store logs remotely in AWS S3, Google Cloud Storage or Elastic Search.
# Users must supply an Airflow connection id that provides access to the storage
# location. If remote_logging is set to true, see UPDATING.md for additional
# configuration requirements.
remote_logging = True
remote_log_conn_id =
remote_base_log_folder =
encrypt_s3_logs = False

[elasticsearch]
elasticsearch_host = xxx.xxx.xxx.xxx
elasticsearch_log_id_template = {dag_id}-{task_id}-{execution_date}-{try_number}
elasticsearch_end_of_log_mark = end_of_log


现在,我不确定如何设置此设置。当查看airflow_local_settings.py文件时,我们可以看到以下代码:

if REMOTE_LOGGING and REMOTE_BASE_LOG_FOLDER.startswith('s3://'):
        DEFAULT_LOGGING_CONFIG['handlers'].update(REMOTE_HANDLERS['s3'])
elif REMOTE_LOGGING and REMOTE_BASE_LOG_FOLDER.startswith('gs://'):
        DEFAULT_LOGGING_CONFIG['handlers'].update(REMOTE_HANDLERS['gcs'])
elif REMOTE_LOGGING and REMOTE_BASE_LOG_FOLDER.startswith('wasb'):
        DEFAULT_LOGGING_CONFIG['handlers'].update(REMOTE_HANDLERS['wasb'])
elif REMOTE_LOGGING and ELASTICSEARCH_HOST:
        DEFAULT_LOGGING_CONFIG['handlers'].update(REMOTE_HANDLERS['elasticsearch'])

从逻辑上讲,如果我将远程日志记录设置为True并将Elastic的host / ip放在elastic部分中,它将正常工作。 目前,气流实例还没有生成日志。

1 个答案:

答案 0 :(得分:1)

根据Airflow ElasticsearchTaskHandler doc

    ElasticsearchTaskHandler is a python log handler that
    reads logs from Elasticsearch. Note logs are not directly
    indexed into Elasticsearch. Instead, it flushes logs
    into local files. Additional software setup is required
    to index the log into Elasticsearch, such as using
    Filebeat and Logstash.

不幸的是,此日志处理程序不会将日志直接刷新到您的ES。