将Logstash与HTML日志一起使用

时间:2015-01-29 12:33:23

标签: logstash logstash-grok

我是Logstash的新手,尝试使用它来解析HTML日志文件。 我只需要输出日志行,即忽略前面包含在文件中的JS,CSS和HTML。 文件中的日志行如下所示:

<tr bgcolor="tomato"><td>Jan 28<br>13:52:25.692</td><td>Jan 28<br>13:52:23.950</td><td>qtp114615276-1648 [POST] [call_id:-8009072655119858507]</td><td>REST</td><td>sa</td><td>0.0.0.0</td><td>ERR</td><td>ProjectValidator.validate(36)</td><td>Project does not exist</td></tr>

获取所有行没有问题,但是我希望输出只包含相关的输出,没有HTML标记,看起来像这样:

{
  "db_timestamp": "2015-01-28 13:52:25.692",
  "server_timestamp": "2015-01-28 13:52:25.950",
  "node": "qtp114615276-1648 [POST] [call_id:-8009072655119858507]",
  "thread": "REST",
  "user": "sa",
  "ip": "0.0.0.0",
  "level": "ERR",
  "method": "ProjectValidator.validate(36)",
  "message": "Project does not exist"
}

我的Logstash配置是:

input {
  file {
    type => "request"
    path => "<some path>/*.log"
    start_position => "beginning"
  }
  file {
    type => "log"
    path => "<some path>/*.html"
    start_position => "beginning"
  }
}
filter {
  if [type] == "log" {
    grok {
        match => [ WHAT SHOULD I PUT HERE??? ]  
    }
  }
}
output {
  stdout {}
  if [type] == "request" {
    http {
        http_method => "post"
        url => "http://<some url>"
        mapping =>  ["type", "request", "host" ,"%{host}", "timestamp", "%{@timestamp}", "message", "%{message}"]
    }
  }
  if [type] == "log" {
    http {
        http_method => "post"
        url => "http://<some url>"
        mapping =>  [ ALSO WHAT SHOULD I PUT HERE??? ]
    }
  }
}

有办法吗?到目前为止,我还没有找到任何相关的文档或样本。

谢谢!

2 个答案:

答案 0 :(得分:0)

终于找到了答案。

不确定这是最好还是最优雅的解决方案,但它确实有效。

我将http输出格式更改为&#34; message&#34;,这使我能够覆盖整个消息并将其格式化为JSON,而不是使用映射。此外,还了解了如何在grok过滤器中命名参数并在输出中使用它们。

这是新的Logstash配置文件:

input {
  file {
    type => "request"
    path => "<some path>/*.log"
    start_position => "beginning"
  }
  file {
    type => "log"
    path => "<some path>/*.html"
    start_position => "beginning"
  }
}

filter {
  if [type] == "log" {
    grok {
            match => { "message" => "<tr bgcolor=.*><td>%{MONTH:db_date}%{SPACE}%{MONTHDAY:db_date}<br>%{TIME:db_date}</td><td>%{MONTH:alm_date}%{SPACE}%{MONTHDAY:alm_date}<br>%{TIME:alm_date}</td><td>%{DATA:thread}</td><td>%{DATA:req_type}</td><td>%{DATA:username}</td><td>%{IP:ip}</td><td>%{DATA:level}</td><td>%{DATA:method}</td><td>%{DATA:err_message}</td></tr>" }
    }
  }
}

output { stdout { codec => rubydebug }
  if [type] == "request" {
    http {
        http_method => "post"
        url => "http://<some URL>"
        mapping =>  ["type", "request", "host" ,"%{host}", "timestamp", "%{@timestamp}", "message", "%{message}"]
    }
  }
  if [type] == "log" {
    http {
        format => "message"
        content_type => "application/json"
        http_method => "post"
        url => "http://<some URL>"
        message=> '{
            "db_date":"%{db_date}", 
            "alm_date":"%{alm_date}", 
            "thread": "%{thread}", 
            "req_type": "%{req_type}", 
            "username": "%{username}", 
            "ip": "%{ip}",
            "level": "%{level}",
            "method": "%{method}",
            "message": "%{err_message}"         
        }'
    }
  }
}

请注意http消息块的单引号和此块内参数的双引号。

答案 1 :(得分:0)

对于解析HP ALM日志的任何人,以下Logstash过滤器将执行以下工作:

   grok {
        break_on_match => true
        match => [ "message", "<tr bgcolor=.*><td>%{MONTH:db_date_mon}%{SPACE}%{MONTHDAY:db_date_day}<br>%{TIME:db_date_time}<\/td><td>%{MONTH:alm_date_mon}%{SPACE}%{MONTHDAY:alm_date_day}<br>%{TIME:alm_date_time}<\/td><td>(?<thread_col1>.*?)<\/td><td>(?<request_type>.*?)<\/td><td>(?<login>.*?)<\/td><td>(?<ip>.*?)<\/td><td>(?<level>.*?)<\/td><td>(?<method>.*?)<\/td><td>(?m:(?<log_message>.*?))</td></tr>" ]
        }
    mutate {
        add_field => ["db_date", "%{db_date_mon} %{db_date_day}"] 
        add_field => ["alm_date", "%{alm_date_mon} %{alm_date_day}"]
        remove_field => [ "db_date_mon", "db_date_day", "alm_date_mon", "alm_date_day"  ]            
        gsub => [
           "log_message", "<br>", "
           "
           ]
        gsub => [
           "log_message", "<p>", "   "
           ]

        }

使用Logstash 2.4.0进行测试并正常工作