如何使用elasticsearch进行包含/类似查询?

时间:2018-10-23 14:30:58

标签: elasticsearch

我想用eleasticsearch在tsql查询下面实现

declare @searchstring nvarchar (max) 

set @searchstring = 'tn-241'

set @searchstring = replace(replace('%'+@searchstring+'%', '-', ''), ' ', '')

    SELECT *
    FROM [dbo].[Product] 
    where 
        replace(replace(shortdescription, '-', ''), ' ', '') like @searchstring or
        replace(replace(name, '-', ''), ' ', '') like @searchstring or
        replace(replace(number, '-', ''), ' ', '') like  @searchstring

为此,我使用关键字标记化程序和带有catenate_all的增量过滤器创建了分析器,如下所示:

"search_delimiter": {
"split_on_numerics": "false",
"generate_word_parts": "false",
"preserve_original": "false",
"generate_number_parts": "false",
"catenate_all": "true",
"split_on_case_change": "false",
"type": "word_delimiter",
"stem_english_possessive": "false"
}
       "analyzer": {
    "searchanalyzer": {
    "filter": [
    "lowercase"
    ,
    "search_delimiter"
    ],
    "type": "custom",
    "tokenizer": "keyword"


},
"Name": {
"analyzer": "searchanalyzer",
"type": "string",
"fields": {
"raw": {
"analyzer": "searchanalyzer",
"type": "string"
}
}
},
"Number": {
"analyzer": "searchanalyzer",
"type": "string",
"fields": {
"raw": {
"analyzer": "searchanalyzer",
"type": "string"
}
}
}
"ShortDescription": {
"analyzer": "searchanalyzer",
"type": "string",
"fields": {
"raw": {
"analyzer": "searchanalyzer",
"type": "string"
}
}
},

结果为

curl -XGET "Index/_analyze?analyzer=searchanalyzer&pretty=true" -d "Original Brother TN-241C Toner Cyan"
{
  "tokens" : [ {
    "token" : "originalbrothertn241ctonercyan",
    "start_offset" : 0,
    "end_offset" : 35,
    "type" : "word",
    "position" : 0
  } ]
}
}

因此,我基本上需要使用相同的分析器,并使用query_string或通配符搜索,它们应该执行instring搜索

所以,如果我像下面这样搜索

"query": {
    "query_string" : {
        "fields" : ["Name", "Number", "ShortDescription"],
        "query" : "*TonerCyan*"           
    }
}

它工作正常,但如果我搜索

  "query": {
        "query_string" : {
            "fields" : ["Name", "Number", "ShortDescription"],
            "query" : "*Toner Cyan*"           
        }
    }

它不会返回任何结果,这意味着在执行query_string之前不会应用searchanalyzer,因为我希望它在第二个查询中也应该分别搜索TonerCyan,而不是Toner和Cyan?第一个问题是为什么这不起作用?第二是实现上述tsql查询的最佳方法是什么?它应该搜索多个字段

1 个答案:

答案 0 :(得分:1)

您可以尝试将搜索字符串放在这样的双引号中,并且这样应该可以工作:

{
  "query": {
    "query_string": {
      "fields": [
        "Name",
        "Number",
        "ShortDescription"
      ],
      "query": "*\"Toner Cyan\"*"
    }
  }
}

此外,您应该知道,搜索前缀通配符可能会导致灾难性的性能影响,具体取决于您拥有的数据量。出于这个原因,我仍然坚信您应该为ngram编制索引。