grok - 你如何找到一个引用的字符串

时间:2014-02-15 00:21:36

标签: regex logstash grok

我正在尝试从nginx日志文件中获取输出并将其发送到logstash。

10.1.10.20 - bob [14/Feb/2014:18:57:05 +0000] “POST /main/foo.git/git-upload-pack HTTP/1.1” 200 3653189 “-” “git/1.8.3.4 (Apple Git–47)” 

Grock能够找到前3个单词

10.1.10.20 - bob [14/Feb/2014:18:57:05 +0000]

%{IPV4:user_ip} - %{USERNAME:user_name} \[%{HTTPDATE:time_local}\]

Grok能够找到第3和第4个单词

[14/Feb/2014:18:57:05 +0000] “POST /main/foo.git/git-upload-pack HTTP/1.1”

\[%{HTTPDATE:time_local}\] %{QUOTEDSTRING:request}

但是当我将它们组合起来并试图找到所有4个时,grok说没有结果(使用http://grokdebug.herokuapp.com/进行测试)

10.1.10.20 - bob [14/Feb/2014:18:57:05 +0000] “POST /main/foo.git/git-upload-pack HTTP/1.1” 

%{IPV4:user_ip} - %{USERNAME:user_name} \[%{HTTPDATE:time_local}\]  %{QUOTEDSTRING:request}
#not found

在上面的例子中,任何人都知道如何获取引用的字符串?

我是grok的新手,所以也许我没有正确接近这个。

更新

有趣的是,如果我使用以下日志行,然后手动输入网址,那么它可以正常工作

 bob 14/Feb/2014:18:57:05 +0000 "herp"
 #Once herp works, replace herp, with POST
 bob 14/Feb/2014:18:57:05 +0000 "POST"
 #Once POST works, keep expounding until the whole thing is in place
 autobuild 14/Feb/2014:18:57:05 +0000 "POST /main/builder.git/git-upload-pack HTTP/1.1"

3 个答案:

答案 0 :(得分:3)

模式中的

"POST /main/builder.git/git-upload-pack HTTP/1.1"

“%{WORD:verb}%{URIPATHPARAM:request} HTTP /%{NUMBER:httpversion}”

答案 1 :(得分:0)

发布到堆栈溢出的过程确定了问题。

如果仔细观察,双引号会被区别对待

"POST 

VS

“POST

手动输入双引号可以解决问题

答案 2 :(得分:0)

此外,您可以将此表达式用于日志更改的情况:

"%{WORD:verb}(?:| %{URIPATHPARAM:request})(?:| HTTP/%{NUMBER:httpversion})"

匹配:

"POST /main/builder.git/git-upload-pack HTTP/1.1"

"POST /main/builder.git/git-upload-pack"

"POST"

试试..;)