Logstash grok失败了

时间:2017-03-02 16:43:32

标签: elasticsearch logstash logstash-grok

我试图找到一条消息,但它在日志中没有_grokparsefailure,但实际上并没有说明它失败了。 grok查询适用于https://grokdebug.herokuapp.com/

input {
  file {
  type => "apache-access"
  path => "C:/prdLogs/sent/*"
}
   filter {
   grok {
  match => ['message', '%{IP:clientip} - - \[%{GREEDYDATA:raw_timestamp}   \] "%{WORD:httpmethod} %{NOTSPACE:referrer} HTTP/%{NUMBER:httpversion}" %{NUMBER:response} "-" "%{NOTSPACE:request}" %{QS:UserAgent} %{WORD:httpmethodO} - - HTTP/%{NUMBER:httpversion2} "%{WORD:session}:%{WORD:httpmed}" "-" %{NUMBER:duration}' ]
}
   date {
    match => [ "raw_timestamp" , 'dd/MMM/yyyy:HH:mm:ss Z' ]
    target => '@timestamp'
   }
  }

   output {
elasticsearch { hosts => ["111.44.44.44:9200"] }
  }

数据如下:

199.77.22.22 - - [26/Feb/2017:10:18:45 +0800] "GET /myapp/app/i18n/key/parent.selector.label.select.item/?locale=en_GB&dojo.preventCache=1488075524942 HTTP/1.1" 200 "-" "https://mywebsite.here.com:31000/myApp/home.do" "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.1; Trident/7.0; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; .NET4.0C; .NET4.0E; Tablet PC 2.0)" GET - - HTTP/1.1 "0000bKOk4n4SSBHuyJJKed085D6:1ap8u8p8j" "-" 3203
199.77.22.22 - - [26/Feb/2017:10:18:45 +0800] "GET /myapp/app/i18n/key/parent.selector.label.no.recently.used/?locale=en_GB&dojo.preventCache=1488075525483 HTTP/1.1" 200 "-" "https://mywebsite.here.com:31000/myApp/home.do" "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.1; Trident/7.0; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; .NET4.0C; .NET4.0E; Tablet PC 2.0)" GET - - HTTP/1.1 "0000bKOk4n4SSBHuyJJKed085D6:1ap8u8p8j" "-" 3159
199.77.22.22 - - [26/Feb/2017:10:18:46 +0800] "GET /myapp/app/i18n/key/selector.label.selected/?locale=en_GB&dojo.preventCache=1488075525843 HTTP/1.1" 200 "-" "https://mywebsite.here.com:31000/myApp/home.do" "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.1; Trident/7.0; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; .NET4.0C; .NET4.0E; Tablet PC 2.0)" GET - - HTTP/1.1 "0000bKOk4n4SSBHuyJJKed085D6:1ap8u8p8j" "-" 3600
199.77.22.22 - - [26/Feb/2017:10:18:46 +0800] "GET /myapp/app/i18n/key/actor.selector.label.remove.all/?locale=en_GB&dojo.preventCache=1488075526305 HTTP/1.1" 200 "-" "https://mywebsite.here.com:31000/myApp/home.do" "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.1; Trident/7.0; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; .NET4.0C; .NET4.0E; Tablet PC 2.0)" GET - - HTTP/1.1 "0000bKOk4n4SSBHuyJJKed085D6:1ap8u8p8j" "-" 3224
199.77.22.22 - - [26/Feb/2017:10:18:46 +0800] "GET /myapp/app/i18n/key/com.label.filter.objects/?locale=en_GB&dojo.preventCache=1488075526711 HTTP/1.1" 200 "-" "https://mywebsite.here.com:31000/myApp/home.do" "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.1; Trident/7.0; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; .NET4.0C; .NET4.0E; Tablet PC 2.0)" GET - - HTTP/1.1 "0000bKOk4n4SSBHuyJJKed085D6:1ap8u8p8j" "-" 3299

这实际上是一个apache访问日志,但我无法使用COMBINEDAPACHELOG或COMMONAPACHELOG。实际上是同样的错误!!

elasticsearch中的所有条目都标记为" _grokparsefailure"。我在调试模式下使用log.level运行logstash但是在日志中没有看到任何错误。

使用最新版本的logstash。

请告知。

R2 D2谢谢,我试过这个但没有快乐:(

我创建了一个模式文件并粘贴了您的模式。我刚刚将有效载荷改为" 130.39.22.22 - - [23 / Feb / 2015:10:18:45 +0800]"以下是我的过滤器:

filter {

grok {
      patterns_dir => ["c:/logstashconfig/patterns"]
      match => ['message', '%{IP:clientip} - - /[%{DATE_CUSTOM:timestamp}/]'] 
    }
date {
    match => [ "timestamp" , 'dd/MMM/yyyy:HH:mm:ss Z' ]
    target => '@timestamp'
  }
}

logstash中的调试日志:

{
      "path" => "C:/prdLogs/sent/test",
"@timestamp" => 2017-03-03T00:06:15.269Z,
      "@version" => "1",
      "host" => "hkw20012125",
   "message" => "130.39.22.22 - -     [23/Feb/2015:10:18:45 +0800]\r",
      "type" => "apache-access",
      "tags" => [
    [0]     "_grokparsefailure"
]   
}

有什么想法吗?它是数据末尾的+0800吗? 感谢。

2 个答案:

答案 0 :(得分:0)

当你必须建立自己的模式时,从左侧开始,慢慢走,然后使用debugger

如果你测试这种模式:

%{IP:clientip} - - \[

它有效,但是这个:

%{IP:clientip} - - \[%{GREEDYDATA:raw_timestamp}   \]

没有按'吨。将模式与输入进行比较表明时间戳和结束括号之间没有空格。

将模式的这一部分更改为:

%{IP:clientip} - - \[%{GREEDYDATA:raw_timestamp}\]

作品。

答案 1 :(得分:0)

我认为一旦你的模式中有GREEDYDATA,就意味着要从日志中考虑你的剩余部分:

GREEDYDATA的模式如下:

GREEDYDATA .* <-- means to capture the entire line

如果我没弄错的话,你的 grok 匹配应该是这样的:

grok {
  match => ['message', '%{IPV4:clientip} - - %{GREEDYDATA:data}']
}

除非您需要单独提取值,否则上面的 grok 应该为您完成。我认为你匹配timestamp的方式是错误的。为了处理您的timestamp,您需要在模式文件中包含以下模式:

MONTHDAY (?:(?:0[1-9])|(?:[12][0-9])|(?:3[01])|[1-9])
MONTH \b(?:Jan(?:uary)?|Feb(?:ruary)?|Mar(?:ch)?|Apr(?:il)?|May|Jun(?:e)?|Jul(?:y)?|Aug(?:ust)?|Sep(?:tember)?|Oct(?:ober)?|Nov(?:ember)?|Dec(?:ember)?)\b
YEAR (?>\d\d){1,2}
TIME (?!<[0-9])%{HOUR}:%{MINUTE}(?::%{SECOND})(?![0-9])
DATE_CUSTOM %{MONTHDAY}[/]%{MONTH }[/]%{YEAR}:%{TIME}

然后你可以在grok匹配中使用它:

grok {
    match => ['message', '%{IPV4:clientip} - - \[%{DATE_CUSTOM:timestamp} %{GREEDYDATA:data}']
}

现在,您可以将timestamp匹配为:

date {
    match => [ "timestamp" , 'dd/MMM/yyyy:HH:mm:ss Z' ]
    target => '@timestamp'
}

希望这有帮助!