Logstash会在几个小时后停止处理日志。当日志停止处理时,logstash服务会消耗大量CPU性能(大约25个核心,共32个核心)。 当logstash服务正常工作时,它总共消耗约4-5个核心。 管道每分钟产生约5万个事件。 Logstash Conf(非默认值): 管道工人:15 pipeline.batch.size:100 JVM配置: -Xms15g -Xmx15g
input {
tcp {
port => 5044
type => syslog
}
udp {
port => 5044
type => syslog
}
}
filter {
if [type] == "syslog" {
grok {
match => [ "message", "%{SYSLOG5424PRI}%{NOTSPACE:syslog_timestamp} %{NOTSPACE:syslog_hostname} %{DATA:syslog_program}(?:\[%{POSINT:syslog_pid}\])?: %{GREEDYDATA:syslog_message}" ]
}
kv {
id => "logs_kv"
source => "syslog_message"
trim_key => " "
trim_value => " "
value_split => "="
field_split => " "
}
mutate {
remove_field => [ "syslog_message", "syslog_timestamp" ]
}
#now check if source IP is a private IP, if so, tag it
cidr {
address => [ "%{srcip}" ]
add_tag => [ "src_internalIP" ]
network => [ "10.0.0.0/8", "172.16.0.0/12", "192.168.0.0/16" ]
}
# don't run geoip if it's internalIP, otherwise find the GEOIP location
if "src_internalIP" not in [tags] {
geoip {
add_tag => [ "src_geoip" ]
source => "srcip"
database => "/usr/share/elasticsearch/modules/ingest-geoip/GeoLite2-City.mmdb"
}
geoip {
source => "srcip"
database => "/usr/share/elasticsearch/modules/ingest-geoip/GeoLite2-ASN.mmdb"
}
}
else {
#check DST IP now. If it is a private IP, tag it
cidr {
add_tag => [ "dst_internalIP" ]
address => [ "%{dstip}" ]
network => [ "10.0.0.0/8", "172.16.0.0/12", "192.168.0.0/16" ]
}
# don't run geoip if it's internalIP, otherwise find the GEOIP location
if "dst_internalIP" not in [tags] {
geoip {
add_tag => [ "dst_geoip" ]
source => "dstip"
database => "/usr/share/elasticsearch/modules/ingest-geoip/GeoLite2-City.mmdb"
}
geoip {
source => "dstip"
database => "/usr/share/elasticsearch/modules/ingest-geoip/GeoLite2-ASN.mmdb"
}
}
}
}
}
output {
if [type] == "syslog" {
elasticsearch {hosts => ["127.0.0.1:9200" ]
index => "sysl-%{syslog_hostname}-%{+YYYY.MM.dd}"
}
#stdout { codec => rubydebug }
}
}
logstash停止处理时,我没有在日志文件中看到任何错误(日志级别-跟踪)。只看到这些消息:
[2019-04-19T00:00:12,004][DEBUG][logstash.instrument.periodicpoller.jvm] collector name {:name=>"ConcurrentMarkSweep"}
[2019-04-19T00:00:17,011][DEBUG][logstash.instrument.periodicpoller.jvm] collector name {:name=>"ParNew"}
[2019-04-19T00:00:17,012][DEBUG][logstash.instrument.periodicpoller.jvm] collector name {:name=>"ConcurrentMarkSweep"}
[2019-04-19T00:00:22,015][DEBUG][logstash.instrument.periodicpoller.jvm] collector name {:name=>"ParNew"}
[2019-04-19T00:00:22,015][DEBUG][logstash.instrument.periodicpoller.jvm] collector name {:name=>"ConcurrentMarkSweep"}
[2019-04-19T00:00:27,023][DEBUG][logstash.instrument.periodicpoller.jvm] collector name {:name=>"ParNew"}
[2019-04-19T00:00:27,024][DEBUG][logstash.instrument.periodicpoller.jvm] collector name {:name=>"ConcurrentMarkSweep"}
[2019-04-19T00:00:32,030][DEBUG][logstash.instrument.periodicpoller.jvm] collector name {:name=>"ParNew"}
[2019-04-19T00:00:32,030][DEBUG][logstash.instrument.periodicpoller.jvm] collector name {:name=>"ConcurrentMarkSweep"}
事件格式:
[2019-04-22T13:04:27,190][DEBUG][logstash.pipeline ] filter received {"event"=>{"type"=>"syslog", "@version"=>"1", "@timestamp"=>2019-04-22T10:04:27.159Z, "port"=>50892, "message"=>"<30>2019:04:22-13:05:08 msk ulogd[18998]: id=\"2002\" severity=\"info\" sys=\"SecureNet\" sub=\"packetfilter\" name=\"Packet accepted\" action=\"accept\" fwrule=\"6\" initf=\"eth2\" outitf=\"eth1\" srcmac=\"70:79:b3:ab:e0:e8\" dstmac=\"00:1a:8c:f0:89:02\" srcip=\"10.0.134.138\" dstip=\"10.0.131.134\" proto=\"17\" length=\"66\" tos=\"0x00\" prec=\"0x00\" ttl=\"126\" srcport=\"63936\" dstport=\"53\" ", "host"=>"10.0.130.235"}}
请帮助我调试此问题。
答案 0 :(得分:0)
根据互联网的说法,ParNew垃圾收集器是“阻止世界”。如果恢复需要5秒钟,并且每5秒钟获得一次GC,则由于logstash始终处于阻塞状态,因此不会获得任何吞吐量。
答案 1 :(得分:0)
已解决。 问题出在kv过滤器中,当试图分析通过管道的非结构化数据时,该过滤器停止了logstash。