有没有办法从logstash中最后一次解析的位置开始解析文件?

时间:2017-05-14 11:25:07

标签: elasticsearch logstash elastic-stack logstash-grok

这是我的logstash配置(根据How to write grok pattern in logstash的回答修改)

以下是我遇到的问题:

缓慢:我的文件大小为50 MB,logstash在解析它时需要很长时间。是否有一些配置导致这种缓慢或有其他原因或在解析此大小的文件时logstash缓慢。

开始解析上次解析位置的日志,因为我将这些已解析的事件发送给ELK。

当这是日志中的最后一行时处理多行。

input {
  file {
    path => "/u/bansalp/activemq_primary_plugin.stats.log.0"
### For testing and continual process of the same file, remove these before produciton
    start_position => "beginning"
    sincedb_path => "/dev/null"
### Lets read the logfile and recombine multi line details
    codec => multiline {
      # Grok pattern names are valid! :)
      pattern => "^\[%{YEAR}%{MONTHNUM}%{MONTHDAY}\s*%{TIME}"
      negate => true
      what => "previous"
    }
  }
}
filter {
    ### Let's get some high level data before we split the line (note: anything you grab before the split gets copied)
    if [message] =~ "logPerDestinationStats" {
        grok {
            match => { 
                "message" => "^\[%{YEAR:yr}%{MONTHNUM:mnt}%{MONTHDAY:daynum}\s*%{TIME:time}\s*%{TZ:timezone}\s*(%{DATA:thread_name})\s*%{JAVACLASS:javaclass}#%{WORD:method}\s*%{LOGLEVEL}\]\s*%{DATA}:%{DATA:msg}"
            }
        }
        ### Split the lines back out to being a single line now. (this may be a \r or \n, test which one)
        split { 
            "field" => "message"
        }
        ### Ok, the lines should now be independent, lets add another grok here to get the patterns as dictated by your example [fieldA: str | field2: 0...] etc.
        ### Note: you should look to change the grok pattern to better suit your requirements, I used DATA here to quickly capture your content
        if [message] =~ "^\[destName" {
            grok {
                break_on_match => false
                match => { "message" => "^\[%{DATA}:\s*%{DATA:destName}\s*\|\s*%{DATA}:\s*%{NUMBER:enqueueCount}\s*\|\s*%{DATA}:\s*%{NUMBER:dequeueCount}\s*\|\s*%{DATA}:\s*%{NUMBER:dispatchCount}\s*\|\s*%{DATA}:\s*%{NUMBER:expiredCount}\s*\|\s*%{DATA}:\s*%{NUMBER:inflightCount}\s*\|\s*%{DATA}:\s*%{NUMBER:msgsHeld}\s*\|\s*%{DATA}:\s*%{NUMBER:msgsCached}\s*\|\s*%{DATA}:\s*%{NUMBER:memoryPercentUsage}\s*\|\s*%{DATA}:\s*%{NUMBER:memoryUsage}\s*\|\s*%{DATA}:\s*%{NUMBER:memoryLimit}\s*\|\s*%{DATA}:\s*%{NUMBER:avgEnqueueTimeMs}\s*\|\s*%{DATA}:\s*%{NUMBER:maxEnqueueTimeMs}\s*\|\s*%{DATA}:\s*%{NUMBER:minEnqueueTimeMs}\s*\|\s*%{DATA}:\s*%{NUMBER:currentConsumers}\s*\|\s*%{DATA}:\s*%{NUMBER:currentProducers}\s*\|\s*%{DATA}:\s*%{NUMBER:blockedSendsCount}\s*\|\s*%{DATA}:\s*%{NUMBER:blockedSendsTimeMs}\s*\|\s*%{DATA}:\s*%{NUMBER:minMsgSize}\s*\|\s*%{DATA}:\s*%{NUMBER:maxMsgSize}\s*\|\s*%{DATA}:\s*%{NUMBER:avgMsgSize}\s*\|\s*%{DATA}:\s*%{NUMBER:totalMsgSize}\]$" }
            }
        }
        mutate {
        convert => { "message" => "string" }
            add_field => {
                "session_timestamp" => "%{yr}-%{mnt}-%{daynum} %{time} %{timezone}"
                "load_timestamp" => "%{@timestamp}"
            }
            remove_field => ["yr","mnt", "daynum", "time", "timezone"]
        }
    }
}
output {
  stdout { codec => rubydebug }
}

相同的样本日志

[20170513 06:08:29.734 EDT (StatsCollector-1) bansalp.tools.jms.ActiveMQLoggingPlugin$ActiveMQDestinationStatsCollector#logPerDestinationStats INFO] ActiveMQ Destination Stats (97 destinations):
[destName: topic://topic1 | enqueueCount: 1 | dequeueCount: 1 | dispatchCount: 1 | expiredCount: 0 | inflightCount: 0 | msgsHeld: 0 | msgsCached: 0 | memoryPercentUsage: 0 | memoryUsage: 0 | memoryLimit: 536870912 | avgEnqueueTimeMs: 0.0 | maxEnqueueTimeMs: 0 | minEnqueueTimeMs: 0 | currentConsumers: 1 | currentProducers: 0 | blockedSendsCount: 0 | blockedSendsTimeMs: 0 | minMsgSize: 2392 | maxMsgSize: 2392 | avgMsgSize: 2392.0 | totalMsgSize: 2392]
[destName: topic://topic2 | enqueueCount: 0 | dequeueCount: 0 | dispatchCount: 0 | expiredCount: 0 | inflightCount: 0 | msgsHeld: 0 | msgsCached: 0 | memoryPercentUsage: 0 | memoryUsage: 0 | memoryLimit: 536870912 | avgEnqueueTimeMs: 0.0 | maxEnqueueTimeMs: 0 | minEnqueueTimeMs: 0 | currentConsumers: 3 | currentProducers: 0 | blockedSendsCount: 0 | blockedSendsTimeMs: 0 | minMsgSize: 0 | maxMsgSize: 0 | avgMsgSize: 0.0 | totalMsgSize: 0]

0 个答案:

没有答案