我正在使用logstash将数据从csv文件导入我们的elasticsearch。
在导入期间,我想创建一个包含来自其他两个字段的值的新字段。这是我的导入片段:
input {
file {
path => "/data/xyz/*.csv"
start_position => "beginning"
ignore_older => 0
sincedb_path => "/dev/null"
}
}
filter {
if [path] =~ "csv1" {
csv {
separator => ";"
columns =>
[
"name1",
"name2",
"name3",
"ID"
]
}
mutate {
add_field => {
"searchfield" => "%{name1} %{name2} %{name3}"
}
}
}
output {
if [path] =~ "csv1" {
elasticsearch {
hosts => "localhost"
index => "my_index"
document_id => "%{ID}"
}
}
}
}
这可以按照需要工作,但在例如name3为空的行上,logstash会将%{name3}
写入新字段。有没有办法只添加值,如果它不是空的?
答案 0 :(得分:2)
我认为除了检查是否存在name3
之外没有别的办法,并且基于此,构建您的搜索字段。
if [name3] {
mutate {
id => "with-name3"
add_field => { "searchfield" => "%{name1} %{name2} %{name3}" }
}
} else {
mutate {
id => "without-name3"
add_field => { "searchfield" => "%{name1} %{name2}" }
}
}
或者,如果我理解您的问题,您显然希望将此数据发送到Elasticsearch并希望拥有一个可搜索的字段。为了避免源中的数据重复,您可以使用copy_to
语句构建搜索字段。您的映射看起来如下:
{
"mappings": {
"doc": {
"properties": {
"name1": {
"type": "text",
"copy_to": "searchfield"
},
"name2": {
"type": "text",
"copy_to": "searchfield"
},
"name3": {
"type": "text",
"copy_to": "searchfield"
},
"searchfield": {
"type": "text"
}
}
}
}
}
然后您可以完美地针对该字段运行查询,而不会在源代码中出现重复项。
更新。基本上你的logstash.conf看起来如下:
input {
file {
path => "/data/xyz/*.csv"
start_position => "beginning"
ignore_older => 0
sincedb_path => "/dev/null"
}
}
filter {
if [path] =~ "csv1" {
csv {
separator => ";"
columns => ["name1", "name2", "name3", "ID"]
}
}
}
output {
if [path] =~ "csv1" {
elasticsearch {
hosts => "localhost"
index => "my_index"
document_id => "%{ID}"
}
}
}
然后使用以下内容创建elasticsearch索引:
PUT /my_index/
{
"mappings": {
"doc": {
"properties": {
"name1": {
"type": "text",
"copy_to": "searchfield"
},
"name2": {
"type": "text",
"copy_to": "searchfield"
},
"name3": {
"type": "text",
"copy_to": "searchfield"
},
"searchfield": {
"type": "text"
}
}
}
}
}
然后您可以按如下方式运行搜索:
GET /my_index/_search
{
"query": {
"match": {
"searchfield": {
"query": "your text"
}
}
}
}