我正在使用“经过的”过滤器。我阅读了logstash中的经过过滤器指南。然后,我制作了一个示例配置文件和csv,以测试经过过滤器的工作。但这似乎不起作用。将数据上传到ES并没有改变。我已经附上了csv文件和配置代码。您能否举一些使用过的过滤器的示例。
这是我的配置文件:
input {
file {
path => "/home/paulsteven/log_cars/aggreagate.csv"
start_position => "beginning"
sincedb_path => "/dev/null"
}
}
filter {
csv {
separator => ","
quote_char => "%"
columns => ["state","city","haps","ads","num_id","serial"]
}
elapsed {
start_tag => "taskStarted"
end_tag => "taskEnded"
unique_id_field => "num_id"
}
}
output {
elasticsearch {
hosts => "localhost:9200"
index => "el03"
document_type => "details"
}
stdout{}
}
ES中的输出:
{
"city" => "tirunelveli",
"path" => "/home/paulsteven/log_cars/aggreagate.csv",
"num_id" => "2345-1002-4501",
"message" => "tamil nadu,tirunelveli,hap0,ad1,2345-1002-4501,1",
"@version" => "1",
"serial" => "1",
"haps" => "hap0",
"state" => "tamil nadu",
"host" => "smackcoders",
"ads" => "ad1",
"@timestamp" => 2019-05-06T10:03:51.443Z
}
{
"city" => "chennai",
"path" => "/home/paulsteven/log_cars/aggreagate.csv",
"num_id" => "2345-1002-4501",
"message" => "tamil nadu,chennai,hap0,ad1,2345-1002-4501,5",
"@version" => "1",
"serial" => "5",
"haps" => "hap0",
"state" => "tamil nadu",
"host" => "smackcoders",
"ads" => "ad1",
"@timestamp" => 2019-05-06T10:03:51.447Z
}
{
"city" => "kottayam",
"path" => "/home/paulsteven/log_cars/aggreagate.csv",
"num_id" => "2345-1002-4501",
"message" => "kerala,kottayam,hap1,ad2,2345-1002-4501,9",
"@version" => "1",
"serial" => "9",
"haps" => "hap1",
"state" => "kerala",
"host" => "smackcoders",
"ads" => "ad2",
"@timestamp" => 2019-05-06T10:03:51.449Z
}
{
"city" => "Jalna",
"path" => "/home/paulsteven/log_cars/aggreagate.csv",
"num_id" => "2345-1002-4501",
"message" => "mumbai,Jalna,hap2,ad3,2345-1002-4501,13",
"@version" => "1",
"serial" => "13",
"haps" => "hap2",
"state" => "mumbai",
"host" => "smackcoders",
"ads" => "ad3",
"@timestamp" => 2019-05-06T10:03:51.452Z
}
答案 0 :(得分:1)
您必须标记事件,以便Logstash可以找到开始/结束标记。 基本上,您必须知道何时将事件视为开始事件,何时将其视为结束事件。
经过的过滤器插件仅适用于两个事件(例如,请求事件和响应事件,以便获得它们之间的延迟) 这两种事件都需要拥有一个ID字段,以唯一地标识该特定任务。该字段的名称存储在unique_id_field中。
对于您的示例,您必须标识开始和结束事件的模式,假设您在csv中当 type type (请参见下面的代码)。 >包含“ START”,则该行被视为开始事件,如果包含“ END”,则该行是结束事件,非常简单,还有一个存储唯一标识符的列 id 。
filter {
csv {
separator => ","
quote_char => "%"
columns => ["state","city","haps","ads","num_id","serial", "type", "id"]
}
grok {
match => { "type" => ".*START.*" }
add_tag => [ "taskStarted" ]
}grok {
match => { "type" => ".*END*" }
add_tag => [ "taskTerminated" ]
} elapsed {
start_tag => "taskStarted"
end_tag => "taskTerminated"
unique_id_field => "id"
}
}
我觉得您的需求与众不同。 例如,如果要聚合两个以上的事件,所有事件的列状态值都相同,请签出this plugin