在我的数据中,我有一个包含一年字符串表示的字段。该字段可以包含其他字符,有时还包含几年字符串 示例:
1995-2000
[2000]
cop. 1865
我想(在Elasticsearch中)提取这些年并将它们解析为数字(多值)字段,以便进行直方图聚合。
我尝试过以下配置,它只给出了字符串的数字部分作为标记,但我无法弄清楚如何进行最后一步并将这些标记解释为整数/短片。
{
"analysis": {
"analyzer": {
"numeric_extractor": {
"filter": [
"numeric_keeper"
],
"tokenizer": "numeric_keeper_tokenizer"
}
},
"char_filter": {
"non_numeric_remover": {
"type": "pattern_replace",
"pattern": "[^0-9]+",
"replacement": " "
}
},
"tokenizer": {
"numeric_keeper_tokenizer": {
"type": "pattern",
"group": 1,
"pattern": "([0-9]{4})"
}
},
"filter": {
"numeric_keeper": {
"type": "pattern_capture",
"preserve_original": 0,
"patterns": [
"([0-9]{4})"
]
}
}
},
"properties": {
"date": {
"fields": {
"date": {
"analyzer": "numeric_extractor",
"index": "analyzed",
"type": "string"
}
},
"type": "multi_field"
}
}
}
弹性版本2.4。