我成功配置了logstash以处理来自文件系统的csv文件,并将它们放入Elastic进行进一步分析。 但是我们的ELK与csv文件的原始资源有很大的分离,所以我考虑通过http将csv文件发送到logstash而不是使用文件系统。
问题是,如果我使用输入“http”,整个文件将作为一大堆被采用和处理。 csv过滤器仅识别第一行。如上所述,同一文件通过“文件”输入工作。
logstash配置如下:
input {
# http {
# host => "localhost"
# port => 8080
# }
file {
path => "/media/sample_files/debit_201606.csv"
type => "items"
start_position => "beginning"
}
}
filter {
csv {
columns => ["Created", "Direction", "Member", "Point Value", "Type", "Sub Type"]
separator => " "
convert => { "Point Value" => "integer" }
}
date {
match => [ "Created", "YYYY-MM-dd HH:mm:ss" ]
timezone => "UTC"
}
}
output {
# elasticsearch {
# action => "index"
# hosts => ["localhost"]
# index => "logstash-%{+YYYY.MM.dd}"
# workers => 1
# }
stdout {
codec => rubydebug
}
}
我的目标是通过curl传递csv。所以切换到上面输入区域的注释部分,然后使用curl传递文件: curl http://localhost:8080/ -T /media/samples/debit_201606.csv
我需要做些什么来实现logstash逐行处理csv?
答案 0 :(得分:0)
我尝试了这个,我认为你需要做的是分割你的输入。以下是您的表现方式:
我的配置:
input {
http {
port => 8787
}
}
filter {
split {}
csv {}
}
output {
stdout { codec => rubydebug }
}
在我的测试中,我创建了一个如下所示的csv文件:
artur@pandaadb:~/tmp/logstash$ cat test.csv
a,b,c
d,e,f
g,h,i
现在进行测试:
artur@pandaadb:~/dev/logstash/conf3$ curl localhost:8787 -T ~/tmp/logstash/test.csv
输出:
{
"message" => "a,b,c",
"@version" => "1",
"@timestamp" => "2016-08-01T15:27:17.477Z",
"host" => "127.0.0.1",
"headers" => {
"request_method" => "PUT",
"request_path" => "/test.csv",
"request_uri" => "/test.csv",
"http_version" => "HTTP/1.1",
"http_host" => "localhost:8787",
"http_user_agent" => "curl/7.47.0",
"http_accept" => "*/*",
"content_length" => "18",
"http_expect" => "100-continue"
},
"column1" => "a",
"column2" => "b",
"column3" => "c"
}
{
"message" => "d,e,f",
"@version" => "1",
"@timestamp" => "2016-08-01T15:27:17.477Z",
"host" => "127.0.0.1",
"headers" => {
"request_method" => "PUT",
"request_path" => "/test.csv",
"request_uri" => "/test.csv",
"http_version" => "HTTP/1.1",
"http_host" => "localhost:8787",
"http_user_agent" => "curl/7.47.0",
"http_accept" => "*/*",
"content_length" => "18",
"http_expect" => "100-continue"
},
"column1" => "d",
"column2" => "e",
"column3" => "f"
}
{
"message" => "g,h,i",
"@version" => "1",
"@timestamp" => "2016-08-01T15:27:17.477Z",
"host" => "127.0.0.1",
"headers" => {
"request_method" => "PUT",
"request_path" => "/test.csv",
"request_uri" => "/test.csv",
"http_version" => "HTTP/1.1",
"http_host" => "localhost:8787",
"http_user_agent" => "curl/7.47.0",
"http_accept" => "*/*",
"content_length" => "18",
"http_expect" => "100-continue"
},
"column1" => "g",
"column2" => "h",
"column3" => "i"
}
拆分过滤器的作用是:
它接收您的输入消息(包含新行的一个字符串)并将其按配置的值(默认为新行)拆分。然后它取消原始事件并重新提交拆分事件以进行logstash。在执行csv过滤器之前执行拆分非常重要。
我希望能回答你的问题!
阿图尔