我正在尝试将逗号分隔的字符串(GROUP_CONCAT
)作为array datatype插入elasticsearch中。作为输入,我使用JDBC,SQL查询的输出如下:
+---------+-----------+------------+--------------------------+-------------+---------------------+---------+------------+----------+---------------------+-------------+---------+----------------------------------------+
| network | post_dbid | host_dbid | host_netid | post_netid | published | n_likes | n_comments | language | indexed | n_harvested | country | vrt |
+---------+-----------+------------+--------------------------+-------------+---------------------+---------+------------+----------+---------------------+-------------+---------+----------------------------------------+
| xxx | 2_xxx | 60480_xxx | xxxxxxxxxxxxxxxxxxxxxxxx | xxxxxxxxxxx | 2017-12-28 08:11:58 | 5 | 0 | en | 2018-05-30 00:00:00 | 0 | ID | Fitness,Well-being |
| xxx | 5_xxx | 98458_xxx | xxxxxxxxxxxxxxxxxxxxxxxx | xxxxxxxxxxx | 2016-09-01 11:59:14 | 2275 | 242 | ar | 2018-05-30 00:00:00 | 0 | SA | SmartPhones_Gadgets |
| xxx | 15_xxx | 50884_xxx | xxxxxxxxxxxxxxxxxxxxxxxx | xxxxxxxxxxx | 2018-04-23 16:36:10 | 0 | 0 | en | 2018-05-30 00:00:00 | 0 | EG | Fashion_Beauty |
| xxx | 21_xxx | 64118_xxx | xxxxxxxxxxxxxxxxxxxxxxxx | xxxxxxxxxxx | 2015-07-01 22:50:54 | 295 | 8 | pt | 2018-05-30 00:00:00 | 0 | BR | Nutrition |
| xxx | 24_xxx | 9767_xxx | xxxxxxxxxxxxxxxxxxxxxxxx | xxxxxxxxxxx | 2017-05-30 02:35:29 | 10 | 1 | en | 2018-06-18 15:32:57 | 0 | US | Health |
| xxx | 87_xxx | 44473_xxx | xxxxxxxxxxxxxxxxxxxxxxxx | xxxxxxxxxxx | 2017-01-08 23:02:52 | 7 | 0 | en | 2018-05-30 00:00:00 | 0 | US | Beverages |
| xxx | 99_xxx | 120198_xxx | xxxxxxxxxxxxxxxxxxxxxxxx | xxxxxxxxxxx | 2018-02-17 02:57:58 | 8 | 0 | en | 2018-05-30 00:00:00 | 0 | US | Food |
| xxx | 126_xxx | 50258_xxx | xxxxxxxxxxxxxxxxxxxxxxxx | xxxxxxxxxxx | 2018-03-22 09:16:25 | 1 | 0 | en | 2018-05-30 00:00:00 | 0 | IN | Health |
+---------+-----------+------------+--------------------------+-------------+---------------------+---------+------------+----------+---------------------+-------------+---------+----------------------------------------+
我使用了mutate插件中的split
:
filter {
mutate {
split => { "vrt" => "," }
}
}
尽管如此,字段还是作为逗号分隔的字符串插入的:
GET xxx/_search
{
"query": {
"terms": {
"_id": ["2_xxx"]
}
}
}
回复:
{
"took": 2,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 1,
"max_score": 1,
"hits": [
{
"_index": "xxx",
"_type": "doc",
"_id": "2_xxx",
"_score": 1,
"_source": {
"post_dbid": "2_xxx",
"host_dbid": "60480_xxx",
"host_netid": "xxxxxxxxxxxxxxxxxxxxxxxx",
"n_likes": 5,
"n_comments": 0,
"country": "ID",
"network": "xxx",
"indexed": "2018-05-30T00:00:00.000Z",
"n_harvested": 0,
"vrt": "Fitness,Well-being",
"@version": "1",
"post_netid": "xxxxxxxxxxx",
"@timestamp": "2018-06-27T15:47:24.370Z",
"language": "en",
"published": "2017-12-28T08:11:58.000Z"
}
}
]
}
}
我的最终目标是插入vrt
作为数组字段,并使用kibana创建可视化效果。例如,我想在kibana上创建一个计数器,并计算有多少文档在vrt
字段上具有“ Fitness”。
ELK版本:6.2.4
答案 0 :(得分:0)
您可以使用ruby过滤器。这是我的方法。我创建了一个ruby方法,该方法将逗号分隔的字符串拆分,修剪,拒绝空元素并删除重复项。然后,您可以对所有逗号分隔的字符串使用该方法,如下所示:
filter {
ruby{
code =>"
# method to split the supplied string by comma, trim whitespace and return an array
def mapStringToArray(strFieldValue)
#if string is not null, return array
if (strFieldValue != nil)
fieldArr = strFieldValue.split(',').map(&:strip).reject(&:empty?).uniq
return fieldArr
end
return [] #return empty array if string is nil
end
vrtArr = mapStringToArray(event.get('vrt'))
if vrtArr.length > 0
event.set('vrt', vrtArr)
end
"
}
}