I'm using Apache Airflow and recognized that the size of the gunicorn-error.log grown over 50 GB within 5 months. Most of the log messages are INFO level logs like:
[2018-05-14 17:31:39 +0000] [29595] [INFO] Handling signal: ttou
[2018-05-14 17:32:37 +0000] [2359] [INFO] Worker exiting (pid: 2359)
[2018-05-14 17:33:07 +0000] [29595] [INFO] Handling signal: ttin
[2018-05-14 17:33:07 +0000] [5758] [INFO] Booting worker with pid:
5758 [2018-05-14 17:33:10 +0000] [29595] [INFO] Handling signal: ttou [2018-05-14 17:33:41 +0000] [2994] [INFO] Worker exiting (pid: 2994)
[2018-05-14 17:34:11 +0000] [29595] [INFO] Handling signal: ttin
[2018-05-14 17:34:11 +0000] [6400] [INFO] Booting worker with pid: 6400 [2018-05-14 17:34:13 +0000] [29595] [INFO] Handling signal: ttou
[2018-05-14 17:34:36 +0000] [3611] [INFO] Worker exiting (pid: 3611)
Within the Airflow config file I'm only able to set the log file path. Does anyone know how to change the gunicorn logging to another level within Airflow? I do not need this fine grained logging level because it's overfills my hard drive.
答案 0 :(得分:0)
在Airflow中,记录日志似乎有点棘手。 原因之一是日志记录分为几个部分。 例如,Airflow的日志记录配置与gunicorn Web服务器的日志记录配置完全不同(您在邮件中提到的“垃圾邮件”日志来自gunicorn)。
为解决此垃圾邮件问题,我通过在webserver()函数中添加几行来对Airflow的bin / cli.py进行了一些修改:
if args.log_config:
run_args += ['--log-config', str(args.log_config)]
(为简洁起见,我没有粘贴代码来处理参数)
然后,关于日志配置文件,我有类似以下内容:
[loggers]
keys=root, gunicorn.error, gunicorn.access
[handlers]
keys=console, error_file, access_file
[formatters]
keys=generic, access
[logger_root]
level=INFO
handlers=console
[logger_gunicorn.error]
level=INFO
handlers=error_file
propagate=0
qualname=gunicorn.error
[logger_gunicorn.access]
level=INFO
handlers=access_file
propagate=1
qualname=gunicorn.access
[handler_console]
class=StreamHandler
formatter=generic
args=(sys.stdout, )
[handler_error_file]
class=logging.handlers.TimedRotatingFileHandler
formatter=generic
args=('/home/airflow/airflow/logs/webserver/gunicorn.error.log',)
[handler_access_file]
class=logging.handlers.TimedRotatingFileHandler
formatter=access
args=('/home/airflow/airflow/logs/webserver/gunicorn.access.log',)
[formatter_generic]
format=[%(name)s] [%(module)s] [%(asctime)s] {%(filename)s:%(lineno)d} %(levelname)s - %(message)s
#format=[%(levelname)s] %(asctime)s [%(process)d] [%(levelname)s] %(message)s
datefmt=%Y-%m-%d %H:%M:%S
class=logging.Formatter
[formatter_access]
format=%(message)s
class=logging.Formatter
请注意gunicorn.error中的“ propagate = 0”,这可以避免标准输出中的垃圾邮件。您仍然可以使用它们,但至少它位于/home/airflow/airflow/airflow/logs/webserver/gunicorn.error.log中,应该将其旋转(说实话,我还没有完全测试旋转部分)。>
如果有时间,我将把此更改作为Jira的Airflow机票提交。
答案 1 :(得分:0)
我设法通过设置环境变量来解决问题
GUNICORN_CMD_ARGS "--log-level WARNING"