Question

我们在Azure Search Service中创建了索引，如下所示：

"analyzers": [
{
    "@odata.type": "#Microsoft.Azure.Search.CustomAnalyzer",
    "name": "SWMLuceneAlongWithCustomHyphenAnalyser",
    "tokenizer": "keyword_v2",
    "tokenFilters": [
        "lowercase"
    ],
    "charFilters": []
}

此分析器被分配给名为“lowerMachineTag”的属性。现在，当我们使用下面的查询进行搜索时，我们得到了预期的结果：

查询：search=lowerSystemID:/.*it\'s.*/lowerMachineTag:/.*it\'s.*/&$filter=(systemID%20ne%20null)%20and%20(ownerSalesforceRecordID%20eq%20'a0h5B000000gJKfQAM')&$count=true&$top=100&$skip=0

结果：

{
    "@odata.context": "https://abcd/indexes('orders-index')/$metadata#docs",
    "@odata.count": 4,
    "value": [
        {
            "@search.score": 0.1862714,
            "systemID": "*1QXEDL8E2V8MGBY",
            "machineTag": "It's me",
            "systemIDMachineTag": "*1QXEDL8E2V8MGBY|It's me",
            "machineTagSystemID": "It's me|*1QXEDL8E2V8MGBY",
            "lowerMachineTag": "it's me",
            "lowerSystemID": "*1qxedl8e2v8mgby",
            "ownerSalesforceRecordID": "a0h5B000000gJKfQAM",
            "parentSalesforceRecordID": "a0h5B000000gJKfQAM"
        },
        {
            "@search.score": 0.16417237,
            "systemID": "*1QXEDL8E2V8MGBY",
            "machineTag": "It's me",
            "systemIDMachineTag": "*1QXEDL8E2V8MGBY|It's me",
            "machineTagSystemID": "It's me|*1QXEDL8E2V8MGBY",
            "lowerMachineTag": "it's me",
            "lowerSystemID": "*1qxedl8e2v8mgby",
            "ownerSalesforceRecordID": "a0h5B000000gJKfQAM",
            "parentSalesforceRecordID": "a0h5B000000gJKfQAM"
        },
        {
            "@search.score": 0.16417237,
            "systemID": "*1QXEDL8E2V8MGBY",
            "machineTag": "It's me",
            "systemIDMachineTag": "*1QXEDL8E2V8MGBY|It's me",
            "machineTagSystemID": "It's me|*1QXEDL8E2V8MGBY",
            "lowerMachineTag": "it's me",
            "lowerSystemID": "*1qxedl8e2v8mgby",
            "ownerSalesforceRecordID": "a0h5B000000gJKfQAM",
            "parentSalesforceRecordID": "a0h5B000000gJKfQAM"
        },
        {
            "@search.score": 0.16417237,
            "systemID": "*1QXEDL8E2V8MGBY",
            "machineTag": "It's me",
            "systemIDMachineTag": "*1QXEDL8E2V8MGBY|It's me",
            "machineTagSystemID": "It's me|*1QXEDL8E2V8MGBY",
            "lowerMachineTag": "it's me",
            "lowerSystemID": "*1qxedl8e2v8mgby",
            "ownerSalesforceRecordID": "a0h5B000000gJKfQAM",
            "parentSalesforceRecordID": "a0h5B000000gJKfQAM"
        }
    ]
}

但是对于分析器配置的一般建议是什么，如果我们应该返回结果，即使我们搜索lowerMachineTag：/。它。 /添加到上述行为

Answer 1

您似乎在搜索查询中使用正则表达式 - 为此，您还必须在查询字符串中添加“＆amp; queryType = full ”。否则，整个搜索术语（“ lowerSystemID：/.* it \'。* / lowerMachineTag：/.* it's。* / ”）将被理解为一个简单的查询，意思是它将使用标准分析仪进行分析，并与任何可搜索的字段进行匹配。通过添加“＆amp; queryType = full ”，您的正则表达式将被理解为仅与指定字段匹配。

根据您的问题，如果指定了“ lowerMachineTag：/。it ./”，则它将不匹配上述四个文档中的任何一个，因为在开头的'。'正则表达式会尝试匹配“it”字符前的字符，至少在上面的四个文档中，“lowerMachineTag”的值始终以“it”开头。

如果你要删除起始'。'字符，只使用“ lowerMachineTag：/ it ./”，它仍然不匹配，因为正则表达式必须匹配整个令牌（添加' '会工作：“lowerMachineTag：/ it。 /”）。

您也可以使用nGram_v2 token filter更改分析器定义以使“/it./”正常工作，如下所示：

"analyzers": [
{
    "@odata.type": "#Microsoft.Azure.Search.CustomAnalyzer",
    "name": "SWMLuceneAlongWithCustomHyphenAnalyser",
    "tokenizer": "keyword_v2",
    "tokenFilters": [
        "lowercase", “myNGramTokenFilter”
    ],
    "charFilters": []
},
"tokenFilters":[  
   {  
      "name":"myNGramTokenFilter",  
      "@odata.type":"Microsoft.Azure.Search.NGramTokenFilterV2",  
      "minGram":1,  
      "maxGram":100
   }  
]

这仍然会使您原始查询（+“queryType = full”）返回相同的结果，并且在使用“lowerMachineTag：/ it ./".

时也会返回结果

我希望这有帮助！

为Azure Search选择正确的分析器

1 个答案: