Question

我已经设置了一个Elasticsearch v5索引，用于将配置哈希映射到URL。

{
 "settings": {
   "analysis": {
    "analyzer": {
        "url-analyzer": {
           "type": "custom",
          "tokenizer": "url-tokenizer"
        }
    },
    "tokenizer": {
        "url-tokenizer": {
            "type": "path_hierarchy",
            "delimiter": "/"
        }
    }
}
},
"mappings": {
    "route": {
      "properties": {
        "uri": {
            "type": "string",
            "index": "analyzed",
            "analyzer": "url-analyzer"
        },
        "config": {
            "type": "object"
        }}}}}

我想将最长的路径前缀与最高得分相匹配，以便给出文件

{ "uri": "/trousers/", "config": { "foo": 1 }}
{ "uri": "/trousers/grey", "config": { "foo": 2 }}
{ "uri": "/trousers/grey/lengthy", "config": { "foo": 3 }}

当我搜索/trousers时，最高结果应为trousers，当我搜索/trousers/grey/short时，最高结果应为/trousers/grey。

相反，我发现/trousers的最高结果是/trousers/grey/lengthy。

如何索引和查询我的文档以实现此目的？

Answer 1

我喝了一个解决方案：如果我们将索引中的URI视为关键字，但仍然在搜索输入上使用PathHierarchyTokenizer会怎样？

现在我们存储以下文档：

self.clear_sendkeys(*ContractorsLocators.EMAIL, text=enter_building) /trousers /trousers/grey

当我们为/trousers/grey/lengthy提交查询时，search_analyzer可以构建输入/trousers/grey/short。

我们的前两个文档将匹配，我们可以使用自定义排序轻松选择最长匹配。

现在我们的映射文档如下所示：

[trousers, trousers/grey, trousers/grey/short]

```

我们的查询如下：

{
"settings": {
"analysis": {
    "analyzer": {
        "uri-analyzer": {
           "type": "custom",
          "tokenizer": "keyword"
        },
        "uri-query": {
                "type": "custom",
                "tokenizer": "uri-tokenizer"
        }
    },
    "tokenizer": {
        "uri-tokenizer": {
            "type": "path_hierarchy",
            "delimiter": "/"
        }
    }
}},

"mappings": {
    "route": {
      "properties": {
        "uri": {
            "type": "text",
            "fielddata": true,
            "analyzer": "uri-analyzer",
            "search_analyzer": "uri-query"
        },

        "config": {
            "type": "object"
        }
      }
    }
  }
}

将最近的祖先与Path Hierarchy Tokenizer

1 个答案: