我有以下消息:
2017-01-06 19:27:53,893 INFO [[ACTIVE] ExecuteThread: '10' for queue: 'weblogic.kernel.Default (self-tuning)'] com.symantec.olp.sas.log.SummaryLogAspect - {country=JP, svid=182, typ=sassum, vid=1120, xffmode=0, initallocdate=2014-12-15T01:08:24Z, xffip=222.230.107.165, seatcnt=1, plgid=2, api=941753644, oslocale=JPN, fng=CJ6FRE1208VMNRQG, req=/228228131/28RXAAPB1DqJj/RSLHL940/EMBXtu+/f+/Zeb/KV1Q/DTXZBFC94ZE5AOmz/mDCqB7zJOARDQO/166180202502162557303662649078783407201612&D09DEEFB7E78065D?NAM=SFlBUy1WQTE2&MFN=VkFJTyBDb3Jwb3JhdGlvbg==&MFM=VkpQMTEx&OLA=JPN&OLO=JPN, llmv=470, oslang=JPN, ctok=166180202502162557303662649078783407201612, resptime=119, epid=70D3B811A994477F957A90985109BE9D, campnid=0, remip=222.230.107.165, lictype=SOS, dbepid=70D3B811A994477F957A90985109BE9D, cid=nav1sasapppex02.msp.symantec.com1481215212435, status=10002, siid=240, skum=21356539, skup=01001230, psn=O749UPCN8KSY, cip=84.100.138.144, mname=VAIO Corporation, puid=1199, skuf=01100470, st=1481765738387, prbid=5967, mmodel=VJP111, clang=EN, pnfi=1120, cprbid=745, cpmv=7428, euip=222.230.107.165, prcdline=2, dvnm=HYAS-VA16, remdays=0, seatid=ah00s8CIdqUQyW2V, sasvid=106, xlsid=3730, baseactkey=186635290403122706518307794, coupon=651218, translogid=75033f05-9cf2-48e2-b924-fc2441d11d33}
2017-01-06 19:28:03,894 INFO [[ACTIVE] ExecuteThread: '10' for queue: 'weblogic.kernel.Default (self-tuning)'] com.symantec.olp.sas.log.SummaryLogAspect - {country=JP, svid=182, typ=sassum, vid=1120, xffmode=0, initallocdate=2014-12-15T01:08:24Z, xffip=222.230.107.165, seatcnt=1, plgid=2, api=228228131, oslocale=JPN, fng=1TA6U8RVL0JQXA0N, req=/228228131/28RXAAPB1DqJj/RSLHL940/EMBXtu+/f+/Zeb/KV1Q/DTXZBFC94ZE5AOmz/mDCqB7zJOARDQO/166180202502162557303662649078783407201612&D09DEEFB7E78065D?NAM=SFlBUy1WQTE2&MFN=VkFJTyBDb3Jwb3JhdGlvbg==&MFM=VkpQMTEx&OLA=JPN&OLO=JPN, lpmv=470, oslang=JPN, ctok=166180202502162557303662649078783407201612, resptime=119, epid=70D3B811A994477F957A90985109BE9D, campnid=0, remip=222.230.107.165, lictype=SOS, dbepid=70D3B811A994477F957A90985109BE9D, cid=nav1sasapppex02.msp.symantec.com1481215212435, status=0000, siid=240, skum=21356539, skup=01001230, psn=28MHHH2VPR4T, cip=222.230.107.165, mname=VAIO Corporation, puid=1199, skuf=01100470, st=1481765738387, prbid=5967, mmodel=VJP111, clang=EN, pnfi=1120, cprbid=745, cpmv=1027, euip=222.230.107.165, prcdline=2, dvnm=HYAS-VA16, remdays=0, seatid=StrlisGXA4yAt1ad, sasvid=130, xlsid=2820, baseactkey=028200017462383754273799438, coupon=123456, translogid=72df4536-6038-4d1c-b213-d0ff5c3c20fb}
我使用下面的grok模式来匹配这些:
(?m)%{TIMESTAMP_ISO8601:timestamp} %{LOGLEVEL:severity} \[%{GREEDYDATA:thread}\] %{JAVACLASS:className} - \{%{GREEDYDATA:logmsg}\}
发布,我使用KV过滤器来分割logmsg字段中的字段,并仅包含我感兴趣的字段。我的问题是:如何验证我感兴趣的字段的格式?我需要提到的一件事是 - 日志在logmsg中包含不同数量的字段,这就是为什么我使用了GREEDYDATA
我的Logstash.conf如下:
input {
kafka {
bootstrap_servers => "brokers_list"
topics => ["transaction-log"]
codec => "json"
}
}
filter {
grok {
match => [ "message", "(?m)%{TIMESTAMP_ISO8601:timestamp} %{LOGLEVEL:severity} \[%{GREEDYDATA:thread}\] %{JAVACLASS:className} - \{%{GREEDYDATA:logmsg}\}" ]
#overwrite => [ "message" ]
}
if "_grokparsefailure" not in [tags] {
kv {
field_split => ", "
source => "logmsg"
include_keys => ["api", "fng", "status", "cip", "cpmv", "translogid", "coupon", "baseactkey", "xlsid", "sasvid", "seatid", "srcHostname", "serverId" ]
allow_duplicate_values => false
remove_field => [ "message", "kafka.*", "logmsg"]
}
}
if [api] != "228228131" {
mutate { add_tag => "_grokparsefailure" }
}
date { # use timestamp from the log
"match" => [ "timestamp", "YYYY-MM-dd HH:mm:ss,SSS" ]
target => "@timestamp"
}
mutate {
remove_field => [ "timestamp" ] # remove unused stuff
}
}
output {
if "_grokparsefailure" not in [tags] {
kafka {
topic_id => "invalid topic"
bootstrap_servers => "brokers_list"
codec => json {}
}
} else {
kafka {
topic_id => "valid topic"
bootstrap_servers => "brokers_list"
codec => json { }
}
}
}
在使用KV过滤器解析后,我检查api字段的值,如果它不等于228228131,那么我将_grokparsefailure标记添加到它并且不再进行处理。
我希望能够验证Include_keys中列出的字段格式,例如cip是客户端IP吗?如何验证这些字段的数据格式?由于我的日志包含不同数量的字段,因此在grok级别我无法验证。只有在使用KV解析后,我才能获得这些字段并验证它们。通过验证我的意思是,验证它们符合ES索引中定义的类型。这是因为如果它们不符合要求,我想将它们发送到Kafka中的无效主题。
我应该使用ruby过滤器来验证吗?如果是的话,你能给我一个样品吗?或者我应该在KV解析后重建事件并再次在新创建的事件上使用grok。
非常感谢一些展示这些的样本。
答案 0 :(得分:0)
一个具体的例子会有所帮助,但你可以用正则表达式检查很多东西:
if [myField] =~ /^[0-9]+$/ {
# it contains a number
}
或类似的东西:
if [myField] =~ /^[a-z]+$/ {
# it contains lower lowercase letters
}