尝试索引CSV文件时,Logstash没有响应

时间:2016-10-21 15:40:43

标签: csv elasticsearch logstash

我有一个包含以下结构的CSV文件

col1, col2, col3 
1|E|D
2|A|F
3|E|F
... 

我正在尝试使用logstash在ElasticSearch上对其进行索引,因此我创建了以下logstash配置文件:

input {
  file {
    path => "/path/to/data"
    start_position => "beginning"
  }
}
filter {
  csv {
      separator => "|"
     columns => ["col1","col2","col3"]
}
}
output {
   elasticsearch {
     hosts => ["localhost:9200"]
     index => "myindex"
     document_type => "mydoctype"
  }
stdout {}
}

但是除了以下内容之外,logstash暂停,没有消息:

$ /opt/logstash/bin/logstash -f logstash.conf
Settings: Default pipeline workers: 8
Pipeline main started

增加详细程度会给出以下消息(不包含任何特定错误)

$ /opt/logstash/bin/logstash -v -f logstash.conf
starting agent {:level=>:info}
starting pipeline {:id=>"main", :level=>:info}
Settings: Default pipeline workers: 8
Registering file input {:path=>["/path/to/data"], :level=>:info}
No sincedb_path set, generating one based on the file path {:sincedb_path=>"/home/username/.sincedb_55b24c6ff18079626c5977ba5741584a", :path=>["/path/to/data"], :level=>:info}
Using mapping template from {:path=>nil, :level=>:info}
Attempting to install template {:manage_template=>{"template"=>"logstash-*", "settings"=>{"index.refresh_interval"=>"5s"}, "mappings"=>{"_default_"=>{"_all"=>{"enabled"=>true, "omit_norms"=>true}, "dynamic_templates"=>[{"message_field"=>{"match"=>"message", "match_mapping_type"=>"string", "mapping"=>{"type"=>"string", "index"=>"analyzed", "omit_norms"=>true, "fielddata"=>{"format"=>"disabled"}}}}, {"string_fields"=>{"match"=>"*", "match_mapping_type"=>"string", "mapping"=>{"type"=>"string", "index"=>"analyzed", "omit_norms"=>true, "fielddata"=>{"format"=>"disabled"}, "fields"=>{"raw"=>{"type"=>"string", "index"=>"not_analyzed", "ignore_above"=>256}}}}}], "properties"=>{"@timestamp"=>{"type"=>"date"}, "@version"=>{"type"=>"string", "index"=>"not_analyzed"}, "geoip"=>{"dynamic"=>true, "properties"=>{"ip"=>{"type"=>"ip"}, "location"=>{"type"=>"geo_point"}, "latitude"=>{"type"=>"float"}, "longitude"=>{"type"=>"float"}}}}}}}, :level=>:info}
New Elasticsearch output {:class=>"LogStash::Outputs::ElasticSearch", :hosts=>["localhost:9200"], :level=>:info}
Starting pipeline {:id=>"main", :pipeline_workers=>8, :batch_size=>125, :batch_delay=>5, :max_inflight=>1000, :level=>:info}
Pipeline main started

有关如何索引csv文件的建议吗?

2 个答案:

答案 0 :(得分:1)

如果在测试期间,您之前已经处理过该文件,则logstash会在输出引用的sincedb文件中记录该文件(inode和字节偏移量)。您可以删除文件(如果不需要),或在文件{}输入中设置sincedb_path。

答案 1 :(得分:1)

由于logstash尝试不重播旧文件行,因此您可以尝试使用tcp输入并将文件netcat到开放端口。

输入部分如下所示:

input {
  tcp {
    port => 12345
  }
}

然后,当logstash正在运行并侦听端口时,您可以使用以下命令发送数据:

cat /path/to/data | nc localhost 12345