Logstash无法从URL

时间:2016-06-30 14:48:34

标签: elastic-stack

我正在尝试从URL中提取查询参数。我正在解析的日志文件中令人不安的一行看起来像这样:

127.0.0.1 - - [09/May/2016:09:32:19 +0200] "GET /ps?attrib[vendor][]=GOK&attrib[vendor][0]=GOK HTTP/1.1" 200 12049 "-" "-"  

attrib的第一次出现产生一个哈希(如预期的那样)。然而,第二次出现导致异常:

IndexError: string not matched
            []= at org/jruby/RubyString.java:3910
            set at /opt/logstash/vendor/bundle/jruby/1.9/gems/logstash-core-event-2.3.3-java/lib/logstash/util/accessors.rb:64
            []= at /opt/logstash/vendor/bundle/jruby/1.9/gems/logstash-core-event-2.3.3-java/lib/logstash/event.rb:136
         filter at /opt/logstash/vendor/bundle/jruby/1.9/gems/logstash-filter-kv-2.1.0/lib/logstash/filters/kv.rb:287
           each at org/jruby/RubyHash.java:1342
         filter at /opt/logstash/vendor/bundle/jruby/1.9/gems/logstash-filter-kv-2.1.0/lib/logstash/filters/kv.rb:287
   multi_filter at /opt/logstash/vendor/bundle/jruby/1.9/gems/logstash-core-2.3.3-java/lib/logstash/filters/base.rb:151
           each at org/jruby/RubyArray.java:1613
   multi_filter at /opt/logstash/vendor/bundle/jruby/1.9/gems/logstash-core-2.3.3-java/lib/logstash/filters/base.rb:148
    filter_func at (eval):189
   filter_batch at /opt/logstash/vendor/bundle/jruby/1.9/gems/logstash-core-2.3.3-java/lib/logstash/pipeline.rb:267
           each at org/jruby/RubyArray.java:1613
         inject at org/jruby/RubyEnumerable.java:852
   filter_batch at /opt/logstash/vendor/bundle/jruby/1.9/gems/logstash-core-2.3.3-java/lib/logstash/pipeline.rb:265
    worker_loop at /opt/logstash/vendor/bundle/jruby/1.9/gems/logstash-core-2.3.3-java/lib/logstash/pipeline.rb:223
  start_workers at /opt/logstash/vendor/bundle/jruby/1.9/gems/logstash-core-2.3.3-java/lib/logstash/pipeline.rb:201

我想这是因为logstash将URL中的数组索引解释为字符串,而索引实际上是整数。
经过几天的谷歌搜索和尝试不同的配置,我走到了尽头。知道如何使这项工作吗?

出于调试目的:

  

logstash config

input {
  file {
    path => "/var/log/apache2/some.log"
    start_position => "beginning"
    sincedb_path => "/dev/null"
  }
}

filter {
  grok {
    match => {
      "message" => '%{IPORHOST:clientip} %{USER:ident} %{USER:auth}\s?(%{NUMBER:seconds:int}\/%{NUMBER:microseconds:int})? \[%{HTTPDATE:timestamp}\] "%{WORD:verb} (%{WORD:schema}:)?[\S]+/(%{DATA:endpoint})\?%{DATA:query_string} HTTP/%{NUMBER:httpversion}" %{NUMBER:response:int} (?:-|%{NUMBER:bytes:int}) %{QS:referrer} %{QS:agent}(\s{1}(?:%{HOSTNAME:backend_used}|-) (?:%{NUMBER:backend_time_seconds:float}|-)s)?'
    }
  }

  urldecode {
    field => "query_string"
    charset => "ISO-8859-1"
  }

  kv {
    field_split => "&"
    source => "query_string"
    recursive => true
    allow_duplicate_values => false
  }

  date {
    match => [ "timestamp", "dd/MMM/YYYY:HH:mm:ss Z" ]
    locale => en
  }

  geoip {
    source => "clientip"
  }

  useragent {
    source => "agent"
    target => "useragent"
  }
}

output {
  stdout {
    codec => json
  }
}  
  

自定义动态模板

{
  "template": "apache_elk_example",
  "settings": {
     "index.refresh_interval": "5s"
  },
  "mappings": {
     "_default_": {
        "numeric_detection" : true,
        "dynamic_templates": [
           {
              "message_field": {
                 "mapping": {
                    "index": "analyzed",
                    "omit_norms": true,
                    "type": "string"
                 },
                 "match_mapping_type": "string",
                 "match": "message"
              }
           },
           {
              "string_fields": {
                 "mapping": {
                    "index": "analyzed",
                    "omit_norms": true,
                    "type": "string",
                    "dynamic": true,
                    "fields": {
                       "raw": {
                          "index": "not_analyzed",
                          "ignore_above": 256,
                          "type": "string"
                       }
                    }
                 },
                 "match_mapping_type": "string",
                 "match": "*"
              }
           }
        ],
        "properties": {
           "geoip": {
              "dynamic": true,
              "properties": {
                 "location": {
                    "type": "geo_point"
                 }
              },
              "type": "object"
           },
           "@version": {
              "index": "not_analyzed",
              "type": "string"
           }
        },
        "_all": {
           "enabled": true
        }
     }
  }
}

1 个答案:

答案 0 :(得分:2)

由于attrib[vendor][]=GOKattrib[vendor][0]=GOK在PHP语义上非常相同,您是否考虑过只删除kv过滤器之前的数字索引?类似的东西:

# Remove numeric indices from arrays, otherwise the kv filter will choke, eg:
# attrib[vendor][0]=GOK becomes attrib[vendor][]=GOK
mutate {
  gsub => [
    "query_string", "\[\d+\]", "[]"
  ]
}

kv {
  ...
}