查询弹性搜索,其中索引的单词具有空格

时间:2015-08-14 19:56:05

标签: elasticsearch

我最近尝试使用弹性搜索。但是,我正在努力查询以下场景: 我的索引设置如下:

"analysis": {
    "index_analyzer": {
        "my_index_analyzer": {
            "type": "custom",
            "tokenizer": "standard",
            "filter": ["standard", "lowercase", "nGram"],
            "char-filter": ["my_pattern"]
        }
    },
    "search_analyzer": {
        "my_search_analyzer": {
            "type": "custom",
            "tokenizer": "standard",
            "filter": ["standard", "lowercase", "nGram"],
            "char-filter": ["my_pattern"]
        }
    },
    "filter": {
        "nGram": {
            "type": "nGram",
            "min_gram": 3,
            "max_gram": 40
        }
    },
    "char_filter" : {
        "my_pattern":{
            "type":"pattern_replace",
            "pattern":"\u0020",
            "replacement":""
        }
    }

索引的文件是:

{
   name:'My self'
},
{
   name:'Hell o'
}

如果我搜索Myself,我希望它返回第一个JSON对象,但是这不会发生..

我正在使用此搜索(其中term只是被搜索的字符串):

var query = {
            match: {
                location: term

            }
        };
client.search({
            index: 'requests',
            analyzer:'my_search_analyzer',
            body: {
                query:query
            }
         })

我真的很感激这方面的一些指导!

亲切的问候 JB

1 个答案:

答案 0 :(得分:2)

你几乎就在那里,你的索引定义只有一些小问题和错别字,我们将修复:

  1. 您不需要index_analyzersearch_analyzer只需在my_index_analyzer元素下直接定义my_search_analyzeranalyzer
  2. char-filter应阅读char_filter(带下划线)
  3. 您的空间模式需要额外的反斜杠
  4. 这是我使用的更正设置/映射:

    {
      "settings": {
        "analysis": {
          "analyzer": {
            "my_index_analyzer": {         <--- 1. directly under analyzer
              "type": "custom",
              "tokenizer": "standard",
              "filter": [
                "standard",
                "lowercase",
                "nGram"
              ],
              "char_filter": [             <--- 2. underscore
                "my_pattern"
              ]
            },
            "my_search_analyzer": {        <--- 1. directly under analyzer
              "type": "custom",
              "tokenizer": "standard",
              "filter": [
                "standard",
                "lowercase",
                "nGram"
              ],
              "char_filter": [             <--- 2. underscore
                "my_pattern"
              ]
            }
          },
          "filter": {
            "nGram": {
              "type": "nGram",
              "min_gram": 3,
              "max_gram": 40
            }
          },
          "char_filter": {
            "my_pattern": {
              "type": "pattern_replace",
              "pattern": "\\u0020",        <--- 3. additional backslash
              "replacement": ""
            }
          }
        }
      },
      "mappings": {
        "request": {
          "properties": {
            "location": {
              "type": "string",
              "index_analyzer": "my_index_analyzer"
            }
          }
        }
      }
    }
    

    然后,您可以索引两个示例文档:

    curl -XPUT localhost:9200/requests/request/1 -d '{"location":"My self"}'
    curl -XPUT localhost:9200/requests/request/2 -d '{"location":"Hell o"}'
    

    你会得到你所期望的:

    curl -XPOST localhost:9200/requests/request/_search -d '{
      "query": {
        "match": {
          "location": "Myself"
        }
      }
    }'
    

    将返回包含My self

    的文档