弹性搜索 - 模糊搜索的映射

时间:2017-02-27 04:53:17

标签: elasticsearch

以下是跨机构进行模糊搜索的设置:

{
    "analysis": {
        "filter": {
            "edgeNGramFilter": {
                "type": "nGram",
                "min_gram": 1,
                "max_gram": 20
            },
            "institutes_stopwords": {
                "type": "stop",
                "stopwords": ["College", "University", "Engineering", "of", "Institute", "Technology"]
            },
            "word_joiner": {
                  "type": "word_delimiter",
                  "catenate_all": true
            },
            "specialchars_remover": {
                "type":"pattern_replace",
                "pattern": "[^A-Za-z0-9]",
                "replacement": " "
            }
        },
        "analyzer": {
            "whitespaceAnalyzer": {
                "type": "custom",
                "tokenizer": "standard",
                "filter": [
                    "lowercase",
                    "institutes_stopwords"
                ]
            },
            "edgeNGramAnalyzer": {
                "type": "custom",
                "tokenizer": "standard",
                "filter": [
                    "lowercase",
                    "edgeNGramFilter",
                    "institutes_stopwords",
                    "word_joiner",
                    "specialchars_remover"
                ]
            }
        }
    }
}

映射为,

    "list": {
        "properties": {
            "id": {
                "type":"string",
                "index":"not_analyzed"
            },
            "s_no": {
                "type":"string",
                "index":"not_analyzed"
            },
            "institute": {
                "type": "multi_field",
                "fields": {
                    "institute": {
                        "type": "string",
                        "analyzer": "standard",
                        "index_analyzer": "standard",
                        "search_analyzer": "standard",
                        "filter": "word_joiner",
                        "boost": 10.0
                    },
                    "partial": {
                        "type": "string",
                        "analyzer": "edgeNGramAnalyzer",
                        "index_analyzer": "standard",
                        "filter": "word_joiner",
                        "search_analyzer": "edgeNGramAnalyzer",
                        "boost": 1.0
                    }
                }
            }

因此,当我使用以下查询查询学院名称时,

{
"query":{
    "match":{
        "institute":{
        "query":"A V C College of Engg",
        "fuzziness":3,
        "minimum_should_match":"-40%",
        "boost":5
        }
    }
}

}

对于完全不同的机构来说,它的效果更好;而对于密切相关的机构,如麻省理工学院,有一些误报,例如“VIT学院”等。作为最佳结果出现。

其他情景包括:

* MVC Engineering College is same as MVC Engg College
* MVC Engineering College is same as M.V.C Engineering College
* MVC Engineering College is same as M V C Engineering College

我应该对设置进行任何更改,还是要对查询进行任何更正?

0 个答案:

没有答案