我在azure搜索中有一个索引,它由人名数据组成,如firstname和lastname。
这样的查询搜索3个字母的姓氏时rau&searchFields=LastName
/indexes/customers-index/docs?api-version=2016-09-01&search=rau&searchFields=LastName
找到了名称rau,但最后还是相当远。
{
"@odata.context": "myurl/indexes('customers-index')/$metadata#docs(ID,FirstName,LastName)",
"value": [
{
"@search.score": 8.729204,
"ID": "someid",
"FirstName": "xxx",
"LastName": "Liebetrau"
},
{
"@search.score": 8.729204,
"ID": "someid",
"FirstName": "xxx",
"LastName": "Damerau"
},
{
"@search.score": 8.729204,
"ID": "someid",
"FirstName": "xxx",
"LastName": "Rau"
更多的名字是" Liebetrau"," Damerau"。
有没有办法在顶部有完全匹配?
修改
使用RestApi
查询索引定义GET https://myproduct.search.windows.net/indexes('customers-index')?api-version=2015-02-28-Preview
返回LastName
"name": "LastName",
"type": "Edm.String",
"searchable": true,
"filterable": true,
"retrievable": true,
"sortable": true,
"facetable": true,
"key": false,
"indexAnalyzer": "prefix",
"searchAnalyzer": "standard",
"analyzer": null,
"synonymMaps": []
修改1
分析仪定义
"scoringProfiles": [],
"defaultScoringProfile": null,
"corsOptions": null,
"suggesters": [],
"analyzers": [
{
"name": "prefix",
"tokenizer": "standard",
"tokenFilters": [
"lowercase",
"my_edgeNGram"
],
"charFilters": []
}
],
"tokenizers": [],
"tokenFilters": [
{
"name": "my_edgeNGram",
"minGram": 2,
"maxGram": 20,
"side": "back"
}
],
"charFilters": []
修改2
最后指定我使用查询的ScoringProfile做了诀窍
{
"name": "person-index",
"fields": [
{
"name": "ID",
"type": "Edm.String",
"searchable": false,
"filterable": true,
"retrievable": true,
"sortable": true,
"facetable": true,
"key": true,
"indexAnalyzer": null,
"searchAnalyzer": null,
"analyzer": null
}
,
{
"name": "LastName",
"type": "Edm.String",
"searchable": true,
"filterable": true,
"retrievable": true,
"sortable": true,
"facetable": true,
"key": false,
"analyzer": "my_standard"
},
{
"name": "PartialLastName",
"type": "Edm.String",
"searchable": true,
"filterable": true,
"retrievable": true,
"sortable": true,
"facetable": true,
"key": false,
"indexAnalyzer": "prefix",
"searchAnalyzer": "standard",
"analyzer": null
}
],
"analyzers":[
{
"name":"my_standard",
"@odata.type":"#Microsoft.Azure.Search.CustomAnalyzer",
"tokenizer":"standard_v2",
"tokenFilters":[ "lowercase", "asciifolding" ]
},
{
"name":"prefix",
"@odata.type":"#Microsoft.Azure.Search.CustomAnalyzer",
"tokenizer":"standard_v2",
"tokenFilters":[ "lowercase", "my_edgeNGram" ]
}
],
"tokenFilters":[
{
"name":"my_edgeNGram",
"@odata.type":"#Microsoft.Azure.Search.EdgeNGramTokenFilterV2",
"minGram":2,
"maxGram":20,
"side": "back"
}
],
"scoringProfiles":[
{
"name":"exactFirst",
"text":{
"weights":{ "LastName":2, "PartialLastName":1 }
}
}
]
}
答案 0 :(得分:1)
分析仪"前缀"在LastName字段上设置会为名称 Liebetrau 生成以下术语:au, rau, trau, etrau, betrau, ebetrau, iebetrau, libetrau
。这些长度为edge ngrams的长度为2到20,从单词的后面开始,如索引定义中 my_edgeNGram 标记过滤器中所定义。分析仪将以相同的方式处理其他名称。
当您搜索名称 rau 时,它会匹配所有名称,因为它们都以这些字符结尾。这就是为什么结果集中的所有文档都具有相同的相关性分数的原因。
您可以使用Analyze API。
测试您的分析仪配置要了解有关自定义分析器的更多信息,请转到here和here。
希望有所帮助