为Azure Search选择正确的分析器

时间:2018-05-10 15:44:40

标签: azure-search

我们在Azure Search Service中创建了索引,如下所示:

"analyzers": [
{
    "@odata.type": "#Microsoft.Azure.Search.CustomAnalyzer",
    "name": "SWMLuceneAlongWithCustomHyphenAnalyser",
    "tokenizer": "keyword_v2",
    "tokenFilters": [
        "lowercase"
    ],
    "charFilters": []
}

此分析器被分配给名为“lowerMachineTag”的属性。现在,当我们使用下面的查询进行搜索时,我们得到了预期的结果:

查询:search=lowerSystemID:/.*it\'s.*/lowerMachineTag:/.*it\'s.*/&$filter=(systemID%20ne%20null)%20and%20(ownerSalesforceRecordID%20eq%20'a0h5B000000gJKfQAM')&$count=true&$top=100&$skip=0

结果:

{
    "@odata.context": "https://abcd/indexes('orders-index')/$metadata#docs",
    "@odata.count": 4,
    "value": [
        {
            "@search.score": 0.1862714,
            "systemID": "*1QXEDL8E2V8MGBY",
            "machineTag": "It's me",
            "systemIDMachineTag": "*1QXEDL8E2V8MGBY|It's me",
            "machineTagSystemID": "It's me|*1QXEDL8E2V8MGBY",
            "lowerMachineTag": "it's me",
            "lowerSystemID": "*1qxedl8e2v8mgby",
            "ownerSalesforceRecordID": "a0h5B000000gJKfQAM",
            "parentSalesforceRecordID": "a0h5B000000gJKfQAM"
        },
        {
            "@search.score": 0.16417237,
            "systemID": "*1QXEDL8E2V8MGBY",
            "machineTag": "It's me",
            "systemIDMachineTag": "*1QXEDL8E2V8MGBY|It's me",
            "machineTagSystemID": "It's me|*1QXEDL8E2V8MGBY",
            "lowerMachineTag": "it's me",
            "lowerSystemID": "*1qxedl8e2v8mgby",
            "ownerSalesforceRecordID": "a0h5B000000gJKfQAM",
            "parentSalesforceRecordID": "a0h5B000000gJKfQAM"
        },
        {
            "@search.score": 0.16417237,
            "systemID": "*1QXEDL8E2V8MGBY",
            "machineTag": "It's me",
            "systemIDMachineTag": "*1QXEDL8E2V8MGBY|It's me",
            "machineTagSystemID": "It's me|*1QXEDL8E2V8MGBY",
            "lowerMachineTag": "it's me",
            "lowerSystemID": "*1qxedl8e2v8mgby",
            "ownerSalesforceRecordID": "a0h5B000000gJKfQAM",
            "parentSalesforceRecordID": "a0h5B000000gJKfQAM"
        },
        {
            "@search.score": 0.16417237,
            "systemID": "*1QXEDL8E2V8MGBY",
            "machineTag": "It's me",
            "systemIDMachineTag": "*1QXEDL8E2V8MGBY|It's me",
            "machineTagSystemID": "It's me|*1QXEDL8E2V8MGBY",
            "lowerMachineTag": "it's me",
            "lowerSystemID": "*1qxedl8e2v8mgby",
            "ownerSalesforceRecordID": "a0h5B000000gJKfQAM",
            "parentSalesforceRecordID": "a0h5B000000gJKfQAM"
        }
    ]
}

但是对于分析器配置的一般建议是什么,如果我们应该返回结果,即使我们搜索lowerMachineTag:/。它。 /添加到上述行为

1 个答案:

答案 0 :(得分:2)

您似乎在搜索查询中使用正则表达式 - 为此,您还必须在查询字符串中添加“& queryType = full ”。否则,整个搜索术语(“ lowerSystemID:/.* it \'。* / lowerMachineTag:/.* it's。* / ”)将被理解为一个简单的查询,意思是它将使用标准分析仪进行分析,并与任何可搜索的字段进行匹配。通过添加“& queryType = full ”,您的正则表达式将被理解为仅与指定字段匹配。

根据您的问题,如果指定了“ lowerMachineTag:/。it ./”,则它将不匹配上述四个文档中的任何一个,因为在开头的'。'正则表达式会尝试匹配“it”字符前的字符,至少在上面的四个文档中,“lowerMachineTag”的值始终以“it”开头。

如果你要删除起始'。'字符,只使用“ lowerMachineTag:/ it ./”,它仍然不匹配,因为正则表达式必须匹配整个令牌(添加' '会工作:“lowerMachineTag:/ it。 /”)。

您也可以使用nGram_v2 token filter更改分析器定义以使“/it./”正常工作,如下所示:

"analyzers": [
{
    "@odata.type": "#Microsoft.Azure.Search.CustomAnalyzer",
    "name": "SWMLuceneAlongWithCustomHyphenAnalyser",
    "tokenizer": "keyword_v2",
    "tokenFilters": [
        "lowercase", “myNGramTokenFilter”
    ],
    "charFilters": []
},
"tokenFilters":[  
   {  
      "name":"myNGramTokenFilter",  
      "@odata.type":"Microsoft.Azure.Search.NGramTokenFilterV2",  
      "minGram":1,  
      "maxGram":100
   }  
]

这仍然会使您原始查询(+“queryType = full”)返回相同的结果,并且在使用“lowerMachineTag:/ it ./".

时也会返回结果

我希望这有帮助!