Logstash解析Json时遇到问题

时间:2015-08-27 11:29:41

标签: json logstash logstash-configuration

TL:DR - 我的有效JSON日志被Logstash拒绝,因为有些转义字符导致JSON无效。不幸的是,我无法弄清问题是什么。

完整版:

我的Logstash(+ Logstash-Forwarder)从自定义日志格式中获取Apache日志,其格式如下:

LogFormat "{ \
    \"@timestamp\": \"%{%Y-%m-%dT%H:%M:%S%z}t\", \
    \"@version\": \"1\", \
    \"vhost\":\"%V\", \
    \"tags\":[\"apache-json\"], \
    \"message\": \"%h %l %u %t \\\"%r\\\" %>s %b\", \
    \"clientip\": \"%a\", \
    \"duration\": %D, \
    \"status\": %>s, \
    \"request\": \"%U%q\", \
    \"urlpath\": \"%U\", \
    \"urlquery\": \"%q\", \
    \"bytes\": %B, \
    \"method\": \"%m\", \
    \"referer\": \"%{Referer}i\", \
    \"useragent\": \"%{User-agent}i\" \
}" ls_apache_json

相关的Logstash输入/过滤器配置非常简单:

input {
    lumberjack {
        port => 5000
        type => "logs"
    }
}
filter {
    if [type] =~ /-json$/ {
        json {
            source => "message"
        }
    }
}

似乎仍然可以将某些日志解析为JSON - 我收到了很多这些错误:

{
   :timestamp=>"2015-08-27T12:47:05.165000+0200",
   :message=>"Trouble parsing json",
   :source=>"message",
   :raw=>"{ \t\"@timestamp\": \"2015-08-27T12:47:02+0200\", \t\"@version\": \"1\", \t\"vhost\":\"www.example.org\", \t\"tags\":[\"apache-json\"], \"clientip\": \"127.0.0.1\", \t\"duration\": 1280, \t\"status\": 200, \t\"request\": \"/uploads/_processed_/csm_D\\xc3\\xa4mpfungswanne_3_01_0c75c517e4.jpg\", \t\"urlpath\": \"/uploads/_processed_/csm_D\\xc3\\xa4mpfungswanne_3_01_0c75c517e4.jpg\", \t\"urlquery\": \"\", \t\"bytes\": 2913, \t\"method\": \"GET\", \t\"referer\": \"http://www.example.org/file.html\", \t\"useragent\": \"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/44.0.2403.157 Safari/537.36\"    }",
   :exception=>#<LogStash::Json::ParserError: Unrecognized character escape 'x' (code 120) at [Source: [B@29a15d70; line: 1, column: 231]>
   :level=>:warn
}

我用一个简单的Ruby片段仔细检查了原始消息,这可以解析JSON:

# irb
$ require 'json'
$ s= "{ \t\"@timestamp\": \"2015-08-27T12:47:02+0200\", \t\"@version\": \"1\", \t\"vhost\":\"www.example.org\", \t\"tags\":[\"apache-json\"], \"clientip\": \"127.0.0.1\", \t\"duration\": 1280, \t\"status\": 200, \t\"request\": \"/uploads/_processed_/csm_D\\xc3\\xa4mpfungswanne_3_01_0c75c517e4.jpg\", \t\"urlpath\": \"/uploads/_processed_/csm_D\\xc3\\xa4mpfungswanne_3_01_0c75c517e4.jpg\", \t\"urlquery\": \"\", \t\"bytes\": 2913, \t\"method\": \"GET\", \t\"referer\": \"http://www.example.org/file.html\", \t\"useragent\": \"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/44.0.2403.157 Safari/537.36\"    }"
$ JSON.parse(s)
=> {"@timestamp"=>"2015-08-27T12:47:02+0200", "@version"=>"1", "vhost"=>"www.example.org", "tags"=>["apache-json"], "clientip"=>"127.0.0.1", "duration"=>1280, "status"=>200, "request"=>"/uploads/_processed_/csm_Dxc3xa4mpfungswanne_3_01_0c75c517e4.jpg", "urlpath"=>"/uploads/_processed_/csm_Dxc3xa4mpfungswanne_3_01_0c75c517e4.jpg", "urlquery"=>"", "bytes"=>2913, "method"=>"GET", "referer"=>"http://www.example.org/file.html", "useragent"=>"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/44.0.2403.157 Safari/537.36"}

我正在运行Logstash 1.5.2。我认为这与编解码器有关,因为我试图为json解析器设置codec参数并没有解决问题。 Apache除了确保使用正确的编解码器之外还有什么需要 - 我似乎无法找到任何配置选项:(

欢迎任何帮助。

1 个答案:

答案 0 :(得分:0)

#replace " and \
ruby {
 code => 'str=event["request_body"];   str=str.gsub("\\x22","\"").gsub("\\x5C", "\\"); event["request_body"]=str;'
}