Question

我成功配置了logstash以处理来自文件系统的csv文件，并将它们放入Elastic进行进一步分析。但是我们的ELK与csv文件的原始资源有很大的分离，所以我考虑通过http将csv文件发送到logstash而不是使用文件系统。

问题是，如果我使用输入“http”，整个文件将作为一大堆被采用和处理。 csv过滤器仅识别第一行。如上所述，同一文件通过“文件”输入工作。

logstash配置如下：

input {
#  http {
#    host => "localhost" 
#    port => 8080
#  }
  file {
    path => "/media/sample_files/debit_201606.csv"
    type => "items"
    start_position => "beginning" 
  }
}

filter {  
    csv {
        columns => ["Created", "Direction", "Member", "Point Value", "Type", "Sub Type"]
        separator => "  "
        convert => { "Point Value" => "integer" }
    }
    date {
        match => [ "Created", "YYYY-MM-dd HH:mm:ss" ]
        timezone => "UTC"
    }
}

output {  
#    elasticsearch {
#        action => "index"
#        hosts => ["localhost"]
#        index => "logstash-%{+YYYY.MM.dd}"
#        workers => 1
#    }
     stdout {
         codec => rubydebug
     }
}

我的目标是通过curl传递csv。所以切换到上面输入区域的注释部分，然后使用curl传递文件： curl http://localhost:8080/ -T /media/samples/debit_201606.csv

我需要做些什么来实现logstash逐行处理csv？

Answer 1

我尝试了这个，我认为你需要做的是分割你的输入。以下是您的表现方式：

我的配置：

input {
  http {
      port => 8787
  }
}

filter {
  split {}
  csv {}
}

output {
  stdout { codec => rubydebug }
}

在我的测试中，我创建了一个如下所示的csv文件：

artur@pandaadb:~/tmp/logstash$ cat test.csv 
a,b,c
d,e,f
g,h,i

现在进行测试：

artur@pandaadb:~/dev/logstash/conf3$ curl localhost:8787 -T ~/tmp/logstash/test.csv

输出：

{
       "message" => "a,b,c",
      "@version" => "1",
    "@timestamp" => "2016-08-01T15:27:17.477Z",
          "host" => "127.0.0.1",
       "headers" => {
         "request_method" => "PUT",
           "request_path" => "/test.csv",
            "request_uri" => "/test.csv",
           "http_version" => "HTTP/1.1",
              "http_host" => "localhost:8787",
        "http_user_agent" => "curl/7.47.0",
            "http_accept" => "*/*",
         "content_length" => "18",
            "http_expect" => "100-continue"
    },
       "column1" => "a",
       "column2" => "b",
       "column3" => "c"
}
{
       "message" => "d,e,f",
      "@version" => "1",
    "@timestamp" => "2016-08-01T15:27:17.477Z",
          "host" => "127.0.0.1",
       "headers" => {
         "request_method" => "PUT",
           "request_path" => "/test.csv",
            "request_uri" => "/test.csv",
           "http_version" => "HTTP/1.1",
              "http_host" => "localhost:8787",
        "http_user_agent" => "curl/7.47.0",
            "http_accept" => "*/*",
         "content_length" => "18",
            "http_expect" => "100-continue"
    },
       "column1" => "d",
       "column2" => "e",
       "column3" => "f"
}
{
       "message" => "g,h,i",
      "@version" => "1",
    "@timestamp" => "2016-08-01T15:27:17.477Z",
          "host" => "127.0.0.1",
       "headers" => {
         "request_method" => "PUT",
           "request_path" => "/test.csv",
            "request_uri" => "/test.csv",
           "http_version" => "HTTP/1.1",
              "http_host" => "localhost:8787",
        "http_user_agent" => "curl/7.47.0",
            "http_accept" => "*/*",
         "content_length" => "18",
            "http_expect" => "100-continue"
    },
       "column1" => "g",
       "column2" => "h",
       "column3" => "i"
}

拆分过滤器的作用是：

它接收您的输入消息（包含新行的一个字符串）并将其按配置的值（默认为新行）拆分。然后它取消原始事件并重新提交拆分事件以进行logstash。在执行csv过滤器之前执行拆分非常重要。

我希望能回答你的问题！

阿图尔

如何在logstash中逐行处理http发布的文件？

1 个答案: