我正在使用logstash处理几个Zanox csv导出并将它们导出到elasticsearch。 但是,由于某种原因,logstash通常只处理一些输入文件。
输入文件肯定存在于给定目录中。为了避免Logstash Inode buch,我将sincedb_path设置为/ dev / null,在下载新文件之前每天停止logstash,并在下载完成后启动它。
Logstash和Elasticsearch当前位于同一台服务器上
第一个文件(1)是唯一始终导入的文件。它是一个相当大的饲料,大约半千兆字节。 此外,Zanox csv有一个小故障:第一行以一个点开头,使该行的csv格式无效。
input {
file {
path => ["/var/app/1/*.csv"]
sincedb_path => "/dev/null"
start_position => beginning
type => "1"
}
file {
path => ["/var/app/2/*.csv"]
sincedb_path => "/dev/null"
start_position => beginning
type => "2"
}
file {
path => ["/var/app/3/*.csv"]
sincedb_path => "/dev/null"
start_position => beginning
type => "3"
}
file {
path => ["/var/app/4/*.csv"]
sincedb_path => "/dev/null"
start_position => beginning
type => "4"
}
file {
path => ["/var/app/5/*.csv"]
sincedb_path => "/dev/null"
start_position => beginning
type => "5"
}
file {
path => ["/var/app/6/*.csv"]
sincedb_path => "/dev/null"
start_position => beginning
type => "6"
}
}
filter {
if [type] == "1" {
csv {
columns => [ "title", "price", "image", "deeplink", "color" ]
separator => ";"
}
} else {
csv {
columns => [ "title", "price", "image", "deeplink" ]
separator => ";"
}
}
mutate {
convert => ["price", "float"]
add_field => {"source" => "%{type}"}
}
if ![title] {
drop { }
}
}
output {
elasticsearch{
index => products
index_type => products
host => localhost
document_id => "%{deeplink}"
flush_size => 5000
}
}
logstash没有处理所有文件的原因是什么?
修改 通过执行一些预处理来删除CSV处理错误。现在我在logstash日志中遇到以下错误:
log4j, [2014-09-11T03:41:50.075] WARN: org.elasticsearch.monitor.jvm:
[logstash-17675126.onlinehome-server.info-13551-4016] [gc][young][1345505][23190] duration [2.4s],
collections [1]/[2.8s],
total [2.4s]/[10.3m],
memory [298.8mb]->[122.3mb]/[483.3mb],
all_pools {
[young] [1.9mb]->[1.2mb]/[133.3mb]
}
{
[survivor] [16.6mb]->[0b]/[16.6mb]}{[old] [280.2mb]->[121.1mb]/[333.3mb]
}