解析Nginx日志时Logstash _grokparsefailure

时间:2017-02-17 22:53:44

标签: nginx elasticsearch logstash logstash-grok logstash-configuration

我正在尝试使用Logstash解析nginx日志,一切看起来都很好,除了使用包含Nginx $ remote_user的行获取此_grokparsefailure标记。当$ remote_user为' - '(没有指定$ remote_user时的默认值),Logstash会完成这项工作,但是如果{true}用user@gmail.com这个真正的$ remote_user失败了并放了一个_grokparsefailure标记:

  

127.0.0.1 - - [17 / Feb / 2017:23:14:08 +0100]“GET /favicon.ico HTTP / 1.1”302 169“http://training-hub.tn/trainer/”“Mozilla / 5.0(X11; Linux   x86_64)AppleWebKit / 537.36(KHTML,与Gecko一样)Chrome / 56.0.2924.87   Safari浏览器/ 537.36"

=====>工作正常

  

127.0.0.1 - jemlifathi@gmail.com [17 / Feb / 2017:23:14:07 +0100]“GET /trainer/templates/home.tmpl.html HTTP / 1.1”304 0   “http://training-hub.tn/trainer/”“Mozilla / 5.0(X11; Linux x86_64)   AppleWebKit / 537.36(KHTML,与Gecko一样)Chrome / 56.0.2924.87   Safari浏览器/ 537.36"

=====> _grokparsefailure标记并且无法解析日志行

我正在使用此配置文件:

input {     
    file {      
        path => "/home/dev/node/training-hub/logs/access_log"       
        start_position => "beginning"       
        sincedb_path => "/dev/null"
        ignore_older => 0
        type => "logs"  
    }
}

filter {    
    if[type] == "logs" {        
        mutate {            
             gsub => ["message", "::ffff:", ""]         
        }       
        grok {          
             match=> [
               "message" , "%{COMBINEDAPACHELOG}+%{GREEDYDATA:extra_fields}",
               "message" , "%{COMMONAPACHELOG}+%{GREEDYDATA:extra_fields}"
             ]
             overwrite=> [ "message" ]
        }

        mutate {
          convert=> ["response", "integer"]
          convert=> ["bytes", "integer"]
          convert=> ["responsetime", "float"]
        }
        geoip {
          source => "clientip"
          target => "geoip"
          database => "/etc/logstash/GeoLite2-City.mmdb"
          add_field => [ "[geoip][coordinates]", "%{[geoip][longitude]}" ]
          add_field => [ "[geoip][coordinates]", "%{[geoip][latitude]}"  ]
        }
        mutate {
          convert => [ "[geoip][coordinates]", "float"]
        }

        date {
          match=> [ "timestamp", "dd/MMM/YYYY:HH:mm:ss Z" ]
          remove_field=> [ "timestamp" ]
        }

        useragent {
          source=> "agent"
        }   
     } 
 }

output {    elasticsearch {         hosts => "localhost:9200"   } }

1 个答案:

答案 0 :(得分:0)

使用多个值测试输出后,我意识到Logstash无法解析包含此类$remote_user的日志行,因为它不是有效的用户名(电子邮件地址)所以我添加了mutate gsub过滤器删除@和邮件地址的其余部分以使其有效$remote_user

  

gsub => [“信息”,   “@ + A-Z0-9(:( ?:一个-Z0-9?)|?[(:( ?: 25 [0-5] | 2 [0-4] [0-9] | [01] [0-9] [0-9])){3}(?: 25 [0-5] |?2 [0-4] [0-9] | [01] - [O- ?9] [0-9] | [A-Z0-9 - ] * [A-Z0-9]:(:[\ x01- \ X08 \ X0B \ X0C \ x0e- \ X1F \ x21- \ X5A \ x53- \ 0x7F部分] | \ [\ x01- \ X09 \ X0B \ X0C \ x0e- \ 0x7F部分])+)])   [“,”[“]

现在,它运作良好