使用Logstash从CSV文件读取时如何向elasticsearch文档添加数字ID?

时间:2017-04-26 04:12:57

标签: elasticsearch logstash

使用Logstash从CSV文件导入弹性搜索文档后,我的文档的ID值设置为长字母数字字符串。如何将每个文档ID设置为数值?

这基本上是我的logstash配置的样子:

input {
    file {
        path => "/path/to/movies.csv"
        start_position => "beginning"
        sincedb_path => "/dev/null"
    }
}

filter {
    csv {
        columns => ["title","director","year","country"]
        separator => ","
    }
    mutate {
        convert => {
            "year" => "integer"
        }
    }
}

output {
    elasticsearch {
        hosts => ["localhost:9200"]
        index => "movie"
        document_type => "movie"
    }
    stdout {}
}

1 个答案:

答案 0 :(得分:1)

第一个也是最简单的选择是在CSV中添加新列ID,并将该字段用作文档ID。

另一种选择是使用ruby过滤器,为您的活动添加动态ID。此解决方案的缺点是,如果您的CSV更改并重新运行管道,则每个文档可能无法获得相同的ID。另一个缺点是您需要只使用一个工作程序(即-w 1)运行管道,因为id_seq变量不能在工作管道之间共享。

filter {
    csv {
        columns => ["title","director","year","country"]
        separator => ","
    }
    mutate {
        convert => {
            "year" => "integer"
        }
    }
     # create ID
    ruby {
        "init" => "id_seq = 0"
        "code" => "
            event.set('id', id_seq)
            id_seq += 1
        "
    }
}
output {
    elasticsearch {
        hosts => ["localhost:9200"]
        index => "movie"
        document_type => "movie"
        document_id => "%{id}"
    }
    stdout {}
}