我正在尝试通过logstash将数据从csv文件提供给elasticsearch。这些csv文件包含第一行作为列名。在解析文件时是否有任何特定的方法可以跳过该行?是否有任何我可以使用的条件/过滤器,如果出现异常,它会跳到下一行??
我的配置文件如下:
input {
file {
path => "/home/sagnik/work/logstash-1.4.2/bin/promosms_dec15.csv"
type => "promosms_dec15"
start_position => "beginning"
sincedb_path => "/dev/null"
}
}
filter {
csv {
columns => ["Comm_Plan","Queue_Booking","Order_Reference","Generation_Date"]
separator => ","
}
ruby {
code => "event['Generation_Date'] = Date.parse(event['Generation_Date']);"
}
}
output {
elasticsearch {
action => "index"
host => "localhost"
index => "promosms-%{+dd.MM.YYYY}"
workers => 1
}
}
我的csv文件的前几行看起来像
"Comm_Plan","Queue_Booking","Order_Reference","Generation_Date"
"","No","FMN1191MVHV","31/03/2014"
"","No","FMN1191N64G","31/03/2014"
"","No","FMN1192OPMY","31/03/2014"
无论如何我可以跳过第一行吗?此外,如果我的csv文件以新行结尾,其中没有任何内容,那么我也会收到错误。如果它们位于文件末尾或者如果thre是2行之间的空行,我如何跳过这些新行?
答案 0 :(得分:11)
一种简单的方法是将以下内容添加到过滤器中(在csv之后,在ruby之前):
if [Comm_Plan] == "Comm_Plan" {
drop { }
}
假设该字段通常永远不会具有与列标题相同的值,它应该按预期工作,但是,您可以通过使用更具体:
if [Comm_Plan] == "Comm_Plan" and [Queue_Booking] == "Queue_Booking" and [Order_Reference] == "Order_Reference" and [Generation_Date] == "Generation_Date" {
drop { }
}
所有这一切都可以检查字段值是否具有该特定值,如果有,则删除该事件。
答案 1 :(得分:0)
尝试一下:
mutate {
gsub => ["message","\r\n",""]
}
mutate {
gsub => ["message","\r",""]
}
mutate {
gsub => ["message","\n",""]
}
if ![message] {
drop { }
}