使Elasticsearch仅处理字符串字段的数字部分,并将其解析/复制到数字字段中

时间:2017-04-27 08:38:17

标签: elasticsearch elasticsearch-2.0

在我的数据中,我有一个包含一年字符串表示的字段。该字段可以包含其他字符,有时还包含几年字符串 示例:

1995-2000
[2000]
cop. 1865

我想(在Elasticsearch中)提取这些年并将它们解析为数字(多值)字段,以便进行直方图聚合。

我尝试过以下配置,它只给出了字符串的数字部分作为标记,但我无法弄清楚如何进行最后一步并将这些标记解释为整数/短片。

{
    "analysis": {
        "analyzer": {
            "numeric_extractor": {
                "filter": [
                    "numeric_keeper"
                ],
                "tokenizer": "numeric_keeper_tokenizer"
            }
        },
        "char_filter": {
            "non_numeric_remover": {
                "type": "pattern_replace",
                "pattern": "[^0-9]+",
                "replacement": " "
            }
        },
        "tokenizer": {
            "numeric_keeper_tokenizer": {
                "type": "pattern",
                "group": 1,
                "pattern": "([0-9]{4})"
            }
        },
        "filter": {
            "numeric_keeper": {
                "type": "pattern_capture",
                "preserve_original": 0,
                "patterns": [
                    "([0-9]{4})"
                ]
            }
        }
    },
    "properties": {
        "date": {
            "fields": {
                "date": {
                    "analyzer": "numeric_extractor",
                    "index": "analyzed",
                    "type": "string"
                }
            },
            "type": "multi_field"
        }
    }
}

弹性版本2.4。

0 个答案:

没有答案