如何在天蓝色搜索中将精确匹配排名更高

时间:2017-01-10 21:49:40

标签: azure-search

我在azure搜索中有一个索引,它由人名数据组成,如firstname和lastname。

enter image description here

enter image description here 当我使用像

这样的查询搜索3个字母的姓氏时
rau&searchFields=LastName

/indexes/customers-index/docs?api-version=2016-09-01&search=rau&searchFields=LastName

找到了名称rau,但最后还是相当远。

{
"@odata.context": "myurl/indexes('customers-index')/$metadata#docs(ID,FirstName,LastName)",
"value": [
    {
        "@search.score": 8.729204,
        "ID": "someid",
        "FirstName": "xxx",
        "LastName": "Liebetrau"
    },
    {
        "@search.score": 8.729204,
        "ID": "someid",
        "FirstName": "xxx",
        "LastName": "Damerau"
    },
    {
        "@search.score": 8.729204,
        "ID": "someid",
        "FirstName": "xxx",
        "LastName": "Rau"

更多的名字是" Liebetrau"," Damerau"。

有没有办法在顶部有完全匹配?

修改

使用RestApi

查询索引定义
GET https://myproduct.search.windows.net/indexes('customers-index')?api-version=2015-02-28-Preview

返回LastName

 "name": "LastName",
  "type": "Edm.String",
  "searchable": true,
  "filterable": true,
  "retrievable": true,
  "sortable": true,
  "facetable": true,
  "key": false,
  "indexAnalyzer": "prefix",
  "searchAnalyzer": "standard",
  "analyzer": null,
  "synonymMaps": []

修改1

分析仪定义

      "scoringProfiles": [],
  "defaultScoringProfile": null,
  "corsOptions": null,
  "suggesters": [],
  "analyzers": [
    {
      "name": "prefix",
      "tokenizer": "standard",
      "tokenFilters": [
        "lowercase",
        "my_edgeNGram"
      ],
      "charFilters": []
    }
  ],
  "tokenizers": [],
  "tokenFilters": [
    {
      "name": "my_edgeNGram",
      "minGram": 2,
      "maxGram": 20,
      "side": "back"
    }
  ],
  "charFilters": []

修改2

最后指定我使用查询的ScoringProfile做了诀窍

   {
    "name": "person-index",  
    "fields": [

       {
      "name": "ID",
      "type": "Edm.String",
      "searchable": false,
      "filterable": true,
      "retrievable": true,
      "sortable": true,
      "facetable": true,
      "key": true,
      "indexAnalyzer": null,
      "searchAnalyzer": null,
      "analyzer": null

    }
    ,
    {
      "name": "LastName",
      "type": "Edm.String",
      "searchable": true,
      "filterable": true,
      "retrievable": true,
      "sortable": true,
      "facetable": true,
      "key": false,
      "analyzer":  "my_standard"

    },
     {
      "name": "PartialLastName",
      "type": "Edm.String",
      "searchable": true,
      "filterable": true,
      "retrievable": true,
      "sortable": true,
      "facetable": true,
      "key": false,
      "indexAnalyzer": "prefix",
      "searchAnalyzer": "standard",
      "analyzer": null

    }
    ],
    "analyzers":[
    {
      "name":"my_standard",
      "@odata.type":"#Microsoft.Azure.Search.CustomAnalyzer",
      "tokenizer":"standard_v2",
      "tokenFilters":[ "lowercase", "asciifolding" ]
    },
    {
      "name":"prefix",
      "@odata.type":"#Microsoft.Azure.Search.CustomAnalyzer",
      "tokenizer":"standard_v2",
      "tokenFilters":[ "lowercase", "my_edgeNGram" ]
    }
  ],
  "tokenFilters":[
    {
      "name":"my_edgeNGram",
      "@odata.type":"#Microsoft.Azure.Search.EdgeNGramTokenFilterV2",
      "minGram":2,
      "maxGram":20,
      "side": "back"
    }
  ],
  "scoringProfiles":[
  {
    "name":"exactFirst",
    "text":{
      "weights":{ "LastName":2, "PartialLastName":1 }     
    }
  }
]
}

1 个答案:

答案 0 :(得分:1)

分析仪"前缀"在LastName字段上设置会为名称​​ Liebetrau 生成以下术语:au, rau, trau, etrau, betrau, ebetrau, iebetrau, libetrau。这些长度为edge ngrams的长度为2到20,从单词的后面开始,如索引定义中 my_edgeNGram 标记过滤器中所定义。分析仪将以相同的方式处理其他名称。 当您搜索名称​​ rau 时,它会匹配所有名称,因为它们都以这些字符结尾。这就是为什么结果集中的所有文档都具有相同的相关性分数的原因。

您可以使用Analyze API

测试您的分析仪配置

要了解有关自定义分析器的更多信息,请转到herehere

希望有所帮助