如何在使用KV过滤器解析后验证Logstash中的数据格式

时间:2017-01-06 22:13:43

标签: logstash logstash-grok logstash-configuration

我有以下消息:

2017-01-06 19:27:53,893 INFO [[ACTIVE] ExecuteThread: '10' for queue: 'weblogic.kernel.Default (self-tuning)'] com.symantec.olp.sas.log.SummaryLogAspect - {country=JP, svid=182, typ=sassum, vid=1120, xffmode=0, initallocdate=2014-12-15T01:08:24Z, xffip=222.230.107.165, seatcnt=1, plgid=2, api=941753644, oslocale=JPN, fng=CJ6FRE1208VMNRQG, req=/228228131/28RXAAPB1DqJj/RSLHL940/EMBXtu+/f+/Zeb/KV1Q/DTXZBFC94ZE5AOmz/mDCqB7zJOARDQO/166180202502162557303662649078783407201612&D09DEEFB7E78065D?NAM=SFlBUy1WQTE2&MFN=VkFJTyBDb3Jwb3JhdGlvbg==&MFM=VkpQMTEx&OLA=JPN&OLO=JPN, llmv=470, oslang=JPN, ctok=166180202502162557303662649078783407201612, resptime=119, epid=70D3B811A994477F957A90985109BE9D, campnid=0, remip=222.230.107.165, lictype=SOS, dbepid=70D3B811A994477F957A90985109BE9D, cid=nav1sasapppex02.msp.symantec.com1481215212435, status=10002, siid=240, skum=21356539, skup=01001230, psn=O749UPCN8KSY, cip=84.100.138.144, mname=VAIO Corporation, puid=1199, skuf=01100470, st=1481765738387, prbid=5967, mmodel=VJP111, clang=EN, pnfi=1120, cprbid=745, cpmv=7428, euip=222.230.107.165, prcdline=2, dvnm=HYAS-VA16, remdays=0, seatid=ah00s8CIdqUQyW2V, sasvid=106, xlsid=3730, baseactkey=186635290403122706518307794, coupon=651218, translogid=75033f05-9cf2-48e2-b924-fc2441d11d33}  
2017-01-06 19:28:03,894 INFO [[ACTIVE] ExecuteThread: '10' for queue: 'weblogic.kernel.Default (self-tuning)'] com.symantec.olp.sas.log.SummaryLogAspect - {country=JP, svid=182, typ=sassum, vid=1120, xffmode=0, initallocdate=2014-12-15T01:08:24Z, xffip=222.230.107.165, seatcnt=1, plgid=2, api=228228131, oslocale=JPN, fng=1TA6U8RVL0JQXA0N, req=/228228131/28RXAAPB1DqJj/RSLHL940/EMBXtu+/f+/Zeb/KV1Q/DTXZBFC94ZE5AOmz/mDCqB7zJOARDQO/166180202502162557303662649078783407201612&D09DEEFB7E78065D?NAM=SFlBUy1WQTE2&MFN=VkFJTyBDb3Jwb3JhdGlvbg==&MFM=VkpQMTEx&OLA=JPN&OLO=JPN, lpmv=470, oslang=JPN, ctok=166180202502162557303662649078783407201612, resptime=119, epid=70D3B811A994477F957A90985109BE9D, campnid=0, remip=222.230.107.165, lictype=SOS, dbepid=70D3B811A994477F957A90985109BE9D, cid=nav1sasapppex02.msp.symantec.com1481215212435, status=0000, siid=240, skum=21356539, skup=01001230, psn=28MHHH2VPR4T, cip=222.230.107.165, mname=VAIO Corporation, puid=1199, skuf=01100470, st=1481765738387, prbid=5967, mmodel=VJP111, clang=EN, pnfi=1120, cprbid=745, cpmv=1027, euip=222.230.107.165, prcdline=2, dvnm=HYAS-VA16, remdays=0, seatid=StrlisGXA4yAt1ad, sasvid=130, xlsid=2820, baseactkey=028200017462383754273799438, coupon=123456, translogid=72df4536-6038-4d1c-b213-d0ff5c3c20fb}

我使用下面的grok模式来匹配这些:

(?m)%{TIMESTAMP_ISO8601:timestamp} %{LOGLEVEL:severity} \[%{GREEDYDATA:thread}\] %{JAVACLASS:className} - \{%{GREEDYDATA:logmsg}\}

发布,我使用KV过滤器来分割logmsg字段中的字段,并仅包含我感兴趣的字段。我的问题是:如何验证我感兴趣的字段的格式?我需要提到的一件事是 - 日志在logmsg中包含不同数量的字段,这就是为什么我使用了GREEDYDATA

我的Logstash.conf如下:

input {   
  kafka {
    bootstrap_servers => "brokers_list"
    topics => ["transaction-log"]
    codec => "json"   
  } 
}

filter {
        grok {
            match => [ "message", "(?m)%{TIMESTAMP_ISO8601:timestamp} %{LOGLEVEL:severity} \[%{GREEDYDATA:thread}\] %{JAVACLASS:className} - \{%{GREEDYDATA:logmsg}\}" ]
            #overwrite => [ "message" ]
        }

        if "_grokparsefailure" not in [tags] {
           kv {
              field_split => ", "
              source => "logmsg"
              include_keys => ["api", "fng", "status", "cip", "cpmv", "translogid", "coupon", "baseactkey", "xlsid", "sasvid", "seatid", "srcHostname", "serverId" ]
              allow_duplicate_values => false
              remove_field => [ "message", "kafka.*", "logmsg"]
           }
        }

        if [api] != "228228131" {
           mutate { add_tag => "_grokparsefailure" }
        }

        date { # use timestamp from the log
          "match" => [ "timestamp", "YYYY-MM-dd HH:mm:ss,SSS" ]
          target => "@timestamp"
        }

        mutate {
          remove_field => [ "timestamp" ]  # remove unused stuff
        } 
  }

output {   
  if "_grokparsefailure" not in [tags] {      
    kafka {
        topic_id => "invalid topic"
        bootstrap_servers => "brokers_list"
        codec => json {}      
    }    
  } else {    
    kafka {
        topic_id => "valid topic"
        bootstrap_servers => "brokers_list"
        codec => json { }     
    }    
  } 
}

在使用KV过滤器解析后,我检查api字段的值,如果它不等于228228131,那么我将_grokparsefailure标记添加到它并且不再进行处理。

我希望能够验证Include_keys中列出的字段格式,例如cip是客户端IP吗?如何验证这些字段的数据格式?由于我的日志包含不同数量的字段,因此在grok级别我无法验证。只有在使用KV解析后,我才能获得这些字段并验证它们。通过验证我的意思是,验证它们符合ES索引中定义的类型。这是因为如果它们不符合要求,我想将它们发送到Kafka中的无效主题。

我应该使用ruby过滤器来验证吗?如果是的话,你能给我一个样品吗?或者我应该在KV解析后重建事件并再次在新创建的事件上使用grok。

非常感谢一些展示这些的样本。

1 个答案:

答案 0 :(得分:0)

一个具体的例子会有所帮助,但你可以用正则表达式检查很多东西:

if [myField] =~ /^[0-9]+$/ {
     # it contains a number
}

或类似的东西:

if [myField] =~ /^[a-z]+$/ {
    # it contains lower lowercase letters
}