我对Azure认知搜索非常陌生,并且成功配置了索引以具有自动补全功能(感谢this article,因此使用了部分搜索)。
但是现在我有另一个用例,其中很多文件存储在带有元数据的Azure Blob容器中:
(每个文件的)元数据字段之一称为 partnumbers ,其值是一串用逗号分隔的产品SKU(例如“ 123456,78901,102938,09876”)。 我建立了索引,以便将此信息存储为 Edm.String ,如下所示:
{
"name": "my-index",
"fields": [
{
"name": "partnumbers",
"type": "Edm.String",
"facetable": true,
"filterable": true,
"key": false,
"retrievable": true,
"searchable": true,
"sortable": true,
"analyzer": null,
"indexAnalyzer": null,
"searchAnalyzer": null,
"synonymMaps": [],
"fields": []
},
{
"name": "metadata_storage_name",
"type": "Edm.String",
"facetable": true,
"filterable": true,
"key": false,
"retrievable": false,
"searchable": true,
"sortable": true,
"analyzer": null,
"indexAnalyzer": null,
"searchAnalyzer": null,
"synonymMaps": [],
"fields": []
},
{
"name": "metadata_storage_content_type",
"type": "Edm.String",
"facetable": true,
"filterable": true,
"key": false,
"retrievable": false,
"searchable": true,
"sortable": true,
"analyzer": null,
"indexAnalyzer": null,
"searchAnalyzer": null,
"synonymMaps": [],
"fields": []
},
{
"name": "metadata_storage_last_modified",
"type": "Edm.String",
"facetable": true,
"filterable": true,
"key": false,
"retrievable": false,
"searchable": true,
"sortable": true,
"analyzer": null,
"indexAnalyzer": null,
"searchAnalyzer": null,
"synonymMaps": [],
"fields": []
},
{
"name": "metadata_storage_path",
"type": "Edm.String",
"facetable": true,
"filterable": true,
"key": false,
"retrievable": false,
"searchable": true,
"sortable": true,
"analyzer": null,
"indexAnalyzer": null,
"searchAnalyzer": null,
"synonymMaps": [],
"fields": []
},
{
"name": "metadata_storage_size",
"type": "Edm.Int64",
"facetable": true,
"filterable": true,
"retrievable": false,
"sortable": true,
"analyzer": null,
"indexAnalyzer": null,
"searchAnalyzer": null,
"synonymMaps": [],
"fields": []
},
{
"name": "key",
"type": "Edm.String",
"facetable": true,
"filterable": true,
"key": true,
"retrievable": true,
"searchable": true,
"sortable": true,
"analyzer": null,
"indexAnalyzer": null,
"searchAnalyzer": null,
"synonymMaps": [],
"fields": []
},
{
"name": "partialPartnumbers",
"type": "Edm.String",
"facetable": false,
"filterable": false,
"key": false,
"retrievable": false,
"searchable": true,
"sortable": false,
"analyzer": null,
"indexAnalyzer": "prefixCmAnalyzer",
"searchAnalyzer": "standardCmAnalyzer",
"synonymMaps": [],
"fields": []
},
],
"suggesters": [
{
"name": "my-index_suggester",
"searchMode": "analyzingInfixMatching",
"sourceFields": [
"partnumbers"
]
}
],
"scoringProfiles": [
{
"name": "exactFirst",
"functions": [],
"functionAggregation": null,
"text": {
"weights": {
"partnumbers": 2,
"partialPartnumbers": 1,
}
}
}
],
"defaultScoringProfile": "exactFirst",
"corsOptions": null,
"analyzers": [
{
"@odata.type": "#Microsoft.Azure.Search.CustomAnalyzer",
"name": "standardCmAnalyzer",
"tokenizer": "standard_v2",
"tokenFilters": [
"lowercase",
"asciifolding"
],
"charFilters": []
},
{
"@odata.type": "#Microsoft.Azure.Search.CustomAnalyzer",
"name": "prefixCmAnalyzer",
"tokenizer": "standard_v2",
"tokenFilters": [
"lowercase",
"asciifolding",
"edgeNGramCmTokenFilter"
],
"charFilters": []
}
],
"charFilters": [],
"tokenFilters": [
{
"@odata.type": "#Microsoft.Azure.Search.EdgeNGramTokenFilterV2",
"name": "edgeNGramCmTokenFilter",
"minGram": 2,
"maxGram": 20,
"side": "front"
}
],
"tokenizers": [],
"@odata.etag": "\"0x8D8184F367A74XX\""
}
现在,我正在努力寻找一种方法(通过特定的语法?分析器?令牌生成器?),以找到具有 partnumbers 元数据字段且包含一个单一SKU的所有文件。 (以便我可以检索与一种产品有关的所有文档):我想将SKU“ 102938”传递给Azure搜索,它将返回我所有 partnumbers 中包含此SKU的文件。元数据字段(可能还有其他SKU)。
但是我很难在Google上找到示例,而且该文档似乎-暂时-有点超出我的水平了(我不太确定是否正确理解了什么是分析器,令牌生成器等以及它们如何工作!这是第一次进入“搜索”世界...)。
因此,我非常希望社区能够为我提供帮助,我很想阅读文章,供初学者了解所有内容,教程或任何可以帮助我进一步发展的东西!
谢谢。
答案 0 :(得分:1)
好的,我只是尝试了一些可行的方法:我在 partnumbers 字段中定义了pattern analyzer,当我用Analyzer Text API测试时,确实将我的SKU分成了几个令牌。 之后,我可以搜索一个SKU,它给了我所有我想要的文件! 这是我的索引JSON定义:
{
"name": "my-index",
"fields": [
{
"name": "partnumbers",
"type": "Edm.String",
"facetable": true,
"filterable": true,
"key": false,
"retrievable": true,
"searchable": true,
"sortable": true,
"analyzer": "pattern",
"indexAnalyzer": null,
"searchAnalyzer": null,
"synonymMaps": [],
"fields": []
},
{
"name": "metadata_storage_name",
"type": "Edm.String",
"facetable": true,
"filterable": true,
"key": false,
"retrievable": true,
"searchable": true,
"sortable": true,
"analyzer": null,
"indexAnalyzer": null,
"searchAnalyzer": null,
"synonymMaps": [],
"fields": []
},
{
"name": "metadata_storage_content_type",
"type": "Edm.String",
"facetable": true,
"filterable": true,
"key": false,
"retrievable": true,
"searchable": true,
"sortable": true,
"analyzer": null,
"indexAnalyzer": null,
"searchAnalyzer": null,
"synonymMaps": [],
"fields": []
},
{
"name": "metadata_storage_last_modified",
"type": "Edm.String",
"facetable": true,
"filterable": true,
"key": false,
"retrievable": true,
"searchable": true,
"sortable": true,
"analyzer": null,
"indexAnalyzer": null,
"searchAnalyzer": null,
"synonymMaps": [],
"fields": []
},
{
"name": "metadata_storage_path",
"type": "Edm.String",
"facetable": true,
"filterable": true,
"key": false,
"retrievable": true,
"searchable": true,
"sortable": true,
"analyzer": null,
"indexAnalyzer": null,
"searchAnalyzer": null,
"synonymMaps": [],
"fields": []
},
{
"name": "metadata_storage_size",
"type": "Edm.Int64",
"facetable": true,
"filterable": true,
"retrievable": true,
"sortable": true,
"analyzer": null,
"indexAnalyzer": null,
"searchAnalyzer": null,
"synonymMaps": [],
"fields": []
},
{
"name": "key",
"type": "Edm.String",
"facetable": true,
"filterable": true,
"key": true,
"retrievable": true,
"searchable": true,
"sortable": true,
"analyzer": null,
"indexAnalyzer": null,
"searchAnalyzer": null,
"synonymMaps": [],
"fields": []
},
{
"name": "name",
"type": "Edm.String",
"facetable": true,
"filterable": true,
"key": false,
"retrievable": true,
"searchable": true,
"sortable": true,
"analyzer": null,
"indexAnalyzer": null,
"searchAnalyzer": null,
"synonymMaps": [],
"fields": []
},
{
"name": "partialPartnumbers",
"type": "Edm.String",
"facetable": false,
"filterable": false,
"key": false,
"retrievable": false,
"searchable": true,
"sortable": false,
"analyzer": null,
"indexAnalyzer": "prefixCmAnalyzer",
"searchAnalyzer": "standardCmAnalyzer",
"synonymMaps": [],
"fields": []
},
{
"name": "partialName",
"type": "Edm.String",
"facetable": false,
"filterable": false,
"key": false,
"retrievable": false,
"searchable": true,
"sortable": false,
"analyzer": null,
"indexAnalyzer": "prefixCmAnalyzer",
"searchAnalyzer": "standardCmAnalyzer",
"synonymMaps": [],
"fields": []
}
],
"suggesters": [
{
"name": "conformity-certificates-index_suggester",
"searchMode": "analyzingInfixMatching",
"sourceFields": [
"name"
]
}
],
"scoringProfiles": [
{
"name": "exactFirst",
"functions": [],
"functionAggregation": null,
"text": {
"weights": {
"partnumbers": 4,
"partialPartnumbers": 3,
"name": 2,
"partialName": 1
}
}
}
],
"defaultScoringProfile": "exactFirst",
"corsOptions": null,
"analyzers": [
{
"@odata.type": "#Microsoft.Azure.Search.CustomAnalyzer",
"name": "standardCmAnalyzer",
"tokenizer": "standard_v2",
"tokenFilters": [
"lowercase",
"asciifolding"
],
"charFilters": []
},
{
"@odata.type": "#Microsoft.Azure.Search.CustomAnalyzer",
"name": "prefixCmAnalyzer",
"tokenizer": "standard_v2",
"tokenFilters": [
"lowercase",
"asciifolding",
"edgeNGramCmTokenFilter"
],
"charFilters": []
}
],
"charFilters": [],
"tokenFilters": [
{
"@odata.type": "#Microsoft.Azure.Search.EdgeNGramTokenFilterV2",
"name": "edgeNGramCmTokenFilter",
"minGram": 2,
"maxGram": 20,
"side": "front"
}
],
"tokenizers": [],
"@odata.etag": "\"0x8D818EC80CXXXX\""
}
答案 1 :(得分:1)
您可以使用常规过滤器搜索零件编号。
$ filter = search.in(partnumbers,'102938',',')
您可以在此处的文档中找到更多示例:https://docs.microsoft.com/en-us/azure/search/search-query-odata-filter
在此用例中,请勿使用通配符或正则表达式。您的示例的零件号长度不同。因此,通配符搜索102938 *也会无意中匹配1029381、10293810、102938123等。
您的数据已经明确且精确地列出了一组零件号。您可以查询该列表。
答案 2 :(得分:0)
这应该可以通过正则表达式和通配符搜索来实现
这可以应用于在索引上配置了Lucene查询分析器的任何可搜索字段。
“ ....通过设置queryType = full获得的Full Lucene查询语言,通过添加对更多运算符和查询类型(例如通配符,模糊,正则表达式和字段范围查询)的支持,扩展了默认的Simple查询语言。例如,以简单查询语法发送的正则表达式将被解释为查询字符串而不是表达式。本文中的示例请求使用Full Lucene查询语言。“
fieldName:searchExpression
例如searchFields = partnumbers&$ select = partnumbers&search = partnumbers:102938 *
https://docs.microsoft.com/en-us/azure/search/query-lucene-syntax