将最近的祖先与Path Hierarchy Tokenizer

时间:2016-11-30 12:12:24

标签: elasticsearch elasticsearch-mapping

我已经设置了一个Elasticsearch v5索引,用于将配置哈希映射到URL。

{
 "settings": {
   "analysis": {
    "analyzer": {
        "url-analyzer": {
           "type": "custom",
          "tokenizer": "url-tokenizer"
        }
    },
    "tokenizer": {
        "url-tokenizer": {
            "type": "path_hierarchy",
            "delimiter": "/"
        }
    }
}
},
"mappings": {
    "route": {
      "properties": {
        "uri": {
            "type": "string",
            "index": "analyzed",
            "analyzer": "url-analyzer"
        },
        "config": {
            "type": "object"
        }}}}}

我想将最长的路径前缀与最高得分相匹配,以便给出文件

{ "uri": "/trousers/", "config": { "foo": 1 }}
{ "uri": "/trousers/grey", "config": { "foo": 2 }}
{ "uri": "/trousers/grey/lengthy", "config": { "foo": 3 }}

当我搜索/trousers时,最高结果应为trousers,当我搜索/trousers/grey/short时,最高结果应为/trousers/grey

相反,我发现/trousers的最高结果是/trousers/grey/lengthy

如何索引和查询我的文档以实现此目的?

1 个答案:

答案 0 :(得分:0)

我喝了一个解决方案:如果我们将索引中的URI视为关键字,但仍然在搜索输入上使用PathHierarchyTokenizer会怎样?

现在我们存储以下文档:

self.clear_sendkeys(*ContractorsLocators.EMAIL, text=enter_building) /trousers /trousers/grey

当我们为/trousers/grey/lengthy提交查询时,search_analyzer可以构建输入/trousers/grey/short

我们的前两个文档将匹配,我们可以使用自定义排序轻松选择最长匹配。

现在我们的映射文档如下所示:

[trousers, trousers/grey, trousers/grey/short]

```

我们的查询如下:

{
"settings": {
"analysis": {
    "analyzer": {
        "uri-analyzer": {
           "type": "custom",
          "tokenizer": "keyword"
        },
        "uri-query": {
                "type": "custom",
                "tokenizer": "uri-tokenizer"
        }
    },
    "tokenizer": {
        "uri-tokenizer": {
            "type": "path_hierarchy",
            "delimiter": "/"
        }
    }
}},

"mappings": {
    "route": {
      "properties": {
        "uri": {
            "type": "text",
            "fielddata": true,
            "analyzer": "uri-analyzer",
            "search_analyzer": "uri-query"
        },

        "config": {
            "type": "object"
        }
      }
    }
  }
}