在弹性搜索查询中传递“ - ”

时间:2015-11-18 02:17:21

标签: elasticsearch lucene

当我们传递包含特殊字符的查询时,弹性搜索会拆分文本。 例如。如果我们在查询中通过“test-test”,我们怎样才能使弹性搜索将其视为单个单词而不是将其拆分。

我们正在搜索的字段中使用的分析器:

"text_search_filter": {
        "type":     "edge_ngram",
        "min_gram": 1,
        "max_gram": 15
     },
     "standard_stop_filter": {
       "type":       "stop",
       "stopwords":  "_english_"
     }
   },

   "analyzer": {

     "text_search_analyzer": {
        "type": "custom",
        "tokenizer": "whitespace",
        "filter": [
           "lowercase",
           "asciifolding",
           "text_search_filter"
        ]
     }

}

搜索查询:

"query": {
    "multi_match": {
      "query": "test-test",
      "type": "cross_fields",
      "fields": [
        "FIELD_NAME"
      ],

    }
  }


{
"tokens": [
    {
        "token": "'",
        "start_offset": 0,
        "end_offset": 11,
        "type": "word",
        "position": 1
    },
    {
        "token": "'t",
        "start_offset": 0,
        "end_offset": 11,
        "type": "word",
        "position": 1
    },
    {
        "token": "'te",
        "start_offset": 0,
        "end_offset": 11,
        "type": "word",
        "position": 1
    },
    {
        "token": "'tes",
        "start_offset": 0,
        "end_offset": 11,
        "type": "word",
        "position": 1
    },
    {
        "token": "'test",
        "start_offset": 0,
        "end_offset": 11,
        "type": "word",
        "position": 1
    },
    {
        "token": "'test-",
        "start_offset": 0,
        "end_offset": 11,
        "type": "word",
        "position": 1
    },
    {
        "token": "'test-t",
        "start_offset": 0,
        "end_offset": 11,
        "type": "word",
        "position": 1
    },
    {
        "token": "'test-te",
        "start_offset": 0,
        "end_offset": 11,
        "type": "word",
        "position": 1
    },
    {
        "token": "'test-tes",
        "start_offset": 0,
        "end_offset": 11,
        "type": "word",
        "position": 1
    },
    {
        "token": "'test-test",
        "start_offset": 0,
        "end_offset": 11,
        "type": "word",
        "position": 1
    },
    {
        "token": "'test-test'",
        "start_offset": 0,
        "end_offset": 11,
        "type": "word",
        "position": 1
    }
]

}

1 个答案:

答案 0 :(得分:0)

在我的代码中,我捕获了所有包含“-”的单词并为其添加了引号。

示例: joe-doe->“ joe-doe”

为此的Java代码:

    static String placeWordsWithDashInQuote(String value) {
    return Arrays.stream(value.split("\\s"))
        .filter(v -> !v.isEmpty())
        .map(v -> v.contains("-") && !v.startsWith("\"") ? "\"" + v + "\"" : v)
        .collect(Collectors.joining(" "));
}

,在此示例之后,查询如下:

{
"query": {
    "bool": {
        "must": [
            {
                "query_string": {
                    "fields": [
                        "lastName",
                        "firstName"
                    ],
                    "query": "\"joe-doe\"",
                    "default_operator": "AND"
                }
            }
        ]
    }
},
"sort": [],
"from": 0,
"size": 10 }