ElasticSearch自动输入

时间:2015-10-27 17:17:59

标签: elasticsearch elasticsearch-py

我想搜索产品代码 - 字符和数字的混合(例如:A210/444Alexx 1982 X,...)。 (顺便说一句:有没有人可以搜索这类数据?)

我的索引包含index_analyzersearch_analyzer

{
    "settings": {
        "analysis": {
            "analyzer": {
                "index_analyzer": {
                    "tokenizer": "standard",
                    "filter": [
                        "standard",
                        "lowercase",
                        "asciifolding",
                        "custom_word_delimiter",
                        "custom_edgengram"
                    ]
                },
                "search_analyzer": {
                    "tokenizer": "standard",
                    "filter": [
                        "standard",
                        "asciifolding",
                        "custom_word_delimiter",
                        "lowercase"
                    ]
                }
            },
            "filter": {
                "custom_word_delimiter": {
                    "type": "word_delimiter",
                    "preserve_original": "true"
                },
                "custom_edgengram": {
                    "type": "edgeNGram",
                    "min_gram": "2",
                    "max_gram": "30"
                }
            }
        }
    }
}

问题在于自动输入。 index_analyzer没问题,所有值都是word类型。

curl -XGET 'http://localhost:9200/myindex/_analyze?analyzer=index_analyzer&pretty' -d 'Alexx 1982 X' | elasticat.rb

+---+------------+------+------+
| 1 | al         | 0–5  | word |
| 1 | ale        | 0–5  | word |
| 1 | alex       | 0–5  | word |
| 1 | alexx      | 0–5  | word |
| 2 | 19         | 6–10 | word |
| 2 | 198        | 6–10 | word |
| 2 | 1982       | 6–10 | word |
+---+------------+------+------+

但是,search_analyzer(没有edgeNGram)......

curl -XGET 'http://localhost:9200/myindex/_analyze?analyzer=search_analyzer&pretty' -d 'Alexx 1982 X' | elasticat.rb
+---+------------+-------+------------+
| 1 | alexx      | 0–5   | <ALPHANUM> |
| 2 | 1982       | 6–10  | <NUM>      |
| 3 | x          | 11–12 | <ALPHANUM> |
+---+------------+-------+------------+

...将1982识别为数字,这会导致搜索出现问题(使用_all占位符)。在我尝试仅搜索1982时,搜索结果不会受到影响。

有没有办法强制只使用某种字符串类型?

感谢您的任何想法!

马丁

0 个答案:

没有答案