在logstash中进行多行过滤后的字段识别

时间:2015-07-13 23:19:28

标签: logstash multiline

我正在尝试使用

这样的行组来过滤日志文件中的字段
=================================
BEGIN of purge log
=================================

INF: Verification du lancement du start
INF: Purge du contenu du repertoire des logs archivees a 15j - /users/wtp00/log/archive
INF: Purge du contenu du repertoire tmp a 8j - /users/wtp00/tmp
INF: Purge du contenu du repertoire histo a 8j - /users/wtp00/histo

=================================
END of purge log
=================================

我成功地将INF行视为具有多行编解码器的消息。使用以下过滤器...

filter {
    # Exclude lines with no relevant data
    if ([message] !~ "(^\s*INF:|^\s*$)")  {
        drop {}
    }
    # Treat consecutive lines beginning with INF: as a group
    multiline {
        pattern => "^INF: "
        what => "previous"
    }
    # Delete messages with blank lines
    if ([message] == "")  {
        drop {}
    }
    # Delete \n from messages
    mutate
    {
       gsub => ["message", "\n", ""]
    }

}

......结果如下......

{
       "message" => "INF: Verification du lancement du startINF: Purge du contenu du repertoire des logs archivees a 15j - /users/wtp00/log/archiveINF: Purge du contenu du repertoire tmp a 8j - /users/wtp00/tmpINF: Purge du contenu du repertoire histo a 8j - /users/wtp00/histo",
      "@version" => "1",
    "@timestamp" => "2015-07-13T15:01:49.442Z",
          "host" => "suse",
          "tags" => [
        [0] "multiline"
    ]
}

现在在消息中我想要识别每一行的字段(前面​​的字符串 - 和后面的路径 - ),这可以很容易地考虑到INF:是每行的开头。

在此示例中,字段搜索此消息的结果应类似于:

warning[0] = "Verification du lancement du start"
warning[1] = "Purge du contenu du repertoire des logs archivees a 15j"
warning[2] = "Purge du contenu du repertoire tmp a 8j"
warning[3] = "Purge du contenu du repertoire histo a 8j"

path[0] = ""
path[1] = "/users/wtp00/log/archive"
path[2] = "/users/wtp00/tmp"
path[3] = "/users/wtp00/histo"

我一直在尝试不同的方式,我会继续努力,而且我不知道该怎么做。任何帮助都将非常感激。

问候。

1 个答案:

答案 0 :(得分:0)

关键是在多线之前,为 grok进行不同比赛的字段识别。

解决方案如下:

filter {
    # Exclude lines with no relevant data
    if ([message] !~ "(^\s*INF:|^\s*$)")  {
        drop {}
    }
    # Search warning message and path in messages
    grok {
        match => [ "message", "INF: %{GREEDYDATA:warning} - %{GREEDYDATA:logpath}" ]
        match => [ "message", "INF: %{GREEDYDATA:warning}" ]
        match => [ "message", "^\s*$" ]
    }
    # Add empty logpath field to purge message if not present
    if ![logpath] {
        if ([message] != "") {
            mutate {
                add_field => { "logpath" => "" }
            }
        }
    }
    # Treat consecutive lines beginning with INF: as a group
    multiline {
        pattern => "^INF: "
        what => "previous"
    }
    # Delete \n from messages
    if ([message] == "")  {
        drop {}
    }

}

两件重要的事情:

  • 请勿使用“path”作为字段名称
  • 不要向空行添加字段,之后删除它们不起作用