弹性搜索不适用于具有特殊字符' ^(插入符号)'

时间:2016-06-14 13:47:51

标签: elasticsearch

问题是任何具有增强算子的字符序列" ^(插入符号)"没有返回任何搜索结果。

但是根据以下弹性搜索文档

https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-query-string-query.html#_reserved_characters

    • &安培;&安培; || ! (){} [] ^" 〜*? :\字符可以使用\符号进行转义。

要求在弹性搜索中使用n-gram分析器进行包含搜索。

下面是示例用例和

的映射结构
{
  "settings": {
    "index": {
      "analysis": {
        "analyzer": {
          "nGram_analyzer": {
            "filter": [
              "lowercase",
              "asciifolding"
            ],
            "type": "custom",
            "tokenizer": "ngram_tokenizer"
          },
          "whitespace_analyzer": {
            "filter": [
              "lowercase",
              "asciifolding"
            ],
            "type": "custom",
            "tokenizer": "whitespace"
          }
        },
        "tokenizer": {
          "ngram_tokenizer": {
            "token_chars": [
              "letter",
              "digit",
              "punctuation",
              "symbol"
            ],
            "min_gram": "2",
            "type": "nGram",
            "max_gram": "20"
          }
        }
      }
    }
  },
  "mappings": {
    "employee": {
      "properties": {
        "employeeName": {
          "type": "string",
          "analyzer": "nGram_analyzer",
          "search_analyzer": "whitespace_analyzer"
        }
      }
    }
  }
}

拥有如下所示的员工姓名,并附上特殊字符 的 XYZ%^&安培; *

用于包含搜索的示例查询如下所示

GET
{
  "query": {
    "bool": {
      "must": [
        {
          "match": {
            "employeeName": {
              "query": "xyz%^",
              "type": "boolean",
              "operator": "or"
            }
          }
        }
      ]
    }
  }
}

即使我们尝试逃避"查询":" xyz%\ ^" 其错误。因此,无法搜索任何包含" ^(插入符号)"

非常感谢任何帮助。

1 个答案:

答案 0 :(得分:2)

ngram tokenizer中存在与issue相关的错误。

基本上^ Symbol |Letter |Punctuation不被ngram-tokenizer视为^。  因此,它会对GET <index_name>/_analyze?tokenizer=ngram_tokenizer&text=xyz%25%5E上的输入进行标记。

示例:( url编码为xyz%^):

^

上述分析api的结果显示,下面的回复中没有{ "tokens": [ { "token": "xy", "start_offset": 0, "end_offset": 2, "type": "word", "position": 0 }, { "token": "xyz", "start_offset": 0, "end_offset": 3, "type": "word", "position": 1 }, { "token": "xyz%", "start_offset": 0, "end_offset": 4, "type": "word", "position": 2 }, { "token": "yz", "start_offset": 1, "end_offset": 3, "type": "word", "position": 3 }, { "token": "yz%", "start_offset": 1, "end_offset": 4, "type": "word", "position": 4 }, { "token": "z%", "start_offset": 2, "end_offset": 4, "type": "word", "position": 5 } ] }

<%@ taglib  prefix="c"   uri="http://java.sun.com/jsp/jstl/core"  %>
<%@ page language="java" contentType="text/html; charset=ISO-8859-1"
    pageEncoding="ISO-8859-1"%>
<!DOCTYPE html>
<html>
<body>
    <nav>
        <ul>
            <li class="${pageContext.request.requestURI eq '/Final_Student_Project/students.jsp' ? ' active' : ''"><a href="students.jsp">Student</a></li>
            <li class="${pageContext.request.requestURI eq '/Final_Student_Project/courses.jsp' ? ' active' : ''"><a href="courses.jsp">Course</a></li>
            <li class="${pageContext.request.requestURI eq '/Final_Student_Project/results.jsp' ? ' active' : ''"><a href="results.jsp">Result</a></li>
        </ul>
    </nav>
</body>
</html>

由于'^'未编入索引,因此没有匹配