gunicorn访问日志格式

时间:2017-12-07 12:11:52

标签: python json gunicorn fluentd

我打算在kubernetes通过gunicorn运行烧瓶。为了正确记录日志,我想在json中输出所有日志。

目前我正在使用minikube和https://github.com/inovex/kubernetes-logging进行测试,以便流利地收集日志。

由于以下原因,我设法将错误日志(追溯)正确格式化: JSON formatted logging with Flask and gunicorn

我仍在努力使用访问日志格式。 我指定了以下gunicorn访问日志格式:

access_log_format = '{"remote_ip":"%(h)s","request_id":"%({X-Request-Id}i)s","response_code":"%(s)s","request_method":"%(m)s","request_path":"%(U)s","request_querystring":"%(q)s","request_timetaken":"%(D)s","response_length":"%(B)s"}'

结果日志是json格式化的。但是消息部分(基于access_log_format的格式)现在包含转义双引号,并且不会被流利/ ELK解析为它的各个字段

{"tags": [], "timestamp": "2017-12-07T11:50:20.362559Z", "level": "INFO", "host": "ubuntu", "path": "/usr/local/lib/python2.7/dist-packages/gunicorn/glogging.py", "message": "{\"remote_ip\":\"127.0.0.1\",\"request_id\":\"-\",\"response_code\":\"200\",\"request_method\":\"GET\",\"request_path\":\"/v1/records\",\"request_querystring\":\"\",\"request_timetaken\":\"19040\",\"response_length\":\"20\"}", "logger": "gunicorn.access"}

由于 JPW

3 个答案:

答案 0 :(得分:1)

最简单的解决方案是将外部单引号更改为双引号,将内部双引号更改为单引号,如下所述。

--access-logformat  "{'remote_ip':'%(h)s','request_id':'%({X-Request-Id}i)s','response_code':'%(s)s','request_method':'%(m)s','request_path':'%(U)s','request_querystring':'%(q)s','request_timetaken':'%(D)s','response_length':'%(B)s'}"

以下是示例日志

{'remote_ip':'127.0.0.1','request_id':'-','response_code':'404','request_method':'GET','request_path':'/test','request_querystring':'','request_timetaken':'6642','response_length':'233'}
{'remote_ip':'127.0.0.1','request_id':'-','response_code':'200','request_method':'GET','request_path':'/','request_querystring':'','request_timetaken':'881','response_length':'20'}

答案 1 :(得分:0)

您可以直接在\"的值中转义双引号(--access-logformat),以将日志保留为有效JSON。

因此,如果您在Docker容器中运行Gunicorn,则 Dockerfile 可能会以类似以下内容的结尾:

CMD ["gunicorn",            \
    "-b", "0.0.0.0:5000",   \
    "--access-logfile", "-",\
    "--access-logformat", "{\"remote_ip\":\"%(h)s\",\"request_id\":\"%({X-Request-Id}i)s\",\"response_code\":\"%(s)s\",\"request_method\":\"%(m)s\",\"request_path\":\"%(U)s\",\"request_querystring\":\"%(q)s\",\"request_timetaken\":\"%(D)s\",\"response_length\":\"%(B)s\"}", \
    "app:create_app()"]

找到其余的Gunicorn日志记录选项here

答案 2 :(得分:0)

已经2年了,我假设流利的python记录器已经改变,我现在遇到的问题略有不同,每个Google搜索都指向该讨论。

在gunicorn配置文件中使用示例时

access_log_format = '{"remote_ip":"%(h)s","request_id":"%({X-Request-Id}i)s","response_code":"%(s)s","request_method":"%(m)s","request_path":"%(U)s","request_querystring":"%(q)s","request_timetaken":"%(D)s","response_length":"%(B)s"}'

我得到了将其读取为json并将其与流利的json数据合并的期望行为,但是未填充gunicorn字段

{"tags": [], "level": "INFO", "host": "ubuntu", "logger": "gunicorn.access", "remote_ip":"%(h)s","request_id":"%({X-Request-Id}i)s","response_code":"%(s)s","request_method":"%(m)s","request_path":"%(U)s","request_querystring":"%(q)s","request_timetaken":"%(D)s","response_length":"%(B)s"}

这似乎是因为Gunicorn将access_log_format作为消息传递给记录器,并将所有参数(safe_atoms)作为附加参数传递给了

/gunicorn/glogging.py

        safe_atoms = self.atoms_wrapper_class(
            self.atoms(resp, req, environ, request_time)
        )

        try:
            # safe_atoms = {"s": "200", "m": "GET", ...}
            self.access_log.info(self.cfg.access_log_format, safe_atoms)

但是,如果FluentRecordFormatter将该字符串视为有效的json,它将使用json.loads读取该字符串,但是会忽略传递的所有参数

/fluent/handler.py

    def _format_msg_json(self, record, msg):
        try:
            json_msg = json.loads(str(msg))  # <------- doesn't merge params
            if isinstance(json_msg, dict):
                return json_msg
            else:
                return self._format_msg_default(record, msg)
        except ValueError:
            return self._format_msg_default(record, msg)

将此与default Python formatter进行比较,后者调用record.message = record.getMessage(),后者依次合并其中的参数

/Lib/logging/init.py

    def getMessage(self):
        """
        Return the message for this LogRecord.
        Return the message for this LogRecord after merging any user-supplied
        arguments with the message.
        """
        msg = str(self.msg)
        if self.args:
            msg = msg % self.args  # <------ args get merged in
        return msg

我已经logged an issue使用了fluent-logger-python项目。

解决方法

使用logging filter执行合并,然后再将其传递给FluentRecordFormatter

logger = logging.getLogger('fluent.test')

class ContextFilter(logging.Filter):
    def filter(self, record):
        record.msg = record.msg % record.args
        return True

fluent_handler = handler.FluentHandler('app.follow', host='localhost', port=24224)
formatter = handler.FluentRecordFormatter()
fluent_handler.setFormatter(formatter)
merge_filter = ContextFilter()
fluent_handler.addFilter(merge_filter)
logger.addHandler(fluent_handler)

编辑:日志过滤器不起作用

使用日志记录筛选器的解决方法后,我开始收到类似错误

ValueError: unsupported format character ';' (0x3b) at index 166

事实证明,FluentRecordFormatter确实调用了基本的getMessage实现,将参数合并到消息中

    def format(self, record):
        # Compute attributes handled by parent class.
        super(FluentRecordFormatter, self).format(record)  # <------ record.messge = record.msg % record.args
        # Add ours
        record.hostname = self.hostname

        # Apply format
        data = self._formatter(record)

        self._structuring(data, record)
        return data

问题在于_format_msg_json(self, record, msg)使用record.msg属性,它是未合并数据,而record.message是合并数据。这就产生了一个问题,我的日志记录过滤器正在合并/格式化数据,但是随后日志格式化程序也试图这样做并且偶尔会看到无效的语法。

解决方法2:不要使用Json

我已经完全放弃了从gunicorn / python日志记录输出json。相反,我使用Fluentd的解析器来解析json,例如

<filter *.gunicorn.access>
  @type parser
  key_name message
  reserve_time true
  reserve_data true
  remove_key_name_field true
  hash_value_field access_log
  <parse>
    @type regexp
    expression /^(?<host>[^ ]*) [^ ]* (?<user>[^ ]*) \[(?<time>[^\]]*)\] "(?<method>\S+)(?: +(?<path>[^ ]*) +\S*)?" (?<code>[^ ]*) (?<size>[^ ]*) "(?<referer>[^\"]*)" "(?<agent>[^\"]*)"$/
    time_format %d/%b/%Y:%H:%M:%S %z
  </parse>
</filter>

您可以在此处了解有关选项的操作:https://docs.fluentd.org/filter/parser