Question

我的索引中有条目如下：

 ID   BuildingName  Postalcode Type
  1   ABCD             1234     1
  2   ABCD             7890     1

我需要删除出现在搜索的'BuildingName'字段中的重复项（而不是索引，因为您看到它们是两个不同的条目）。最后我只想看（任何有搜索名称的建筑物）

ID BuildingName Postalcode Type 1 ABCD 1234 1

为什么我不能使用此处所述的字段折叠/聚合（Remove duplicate documents from a search in Elasticsearch） - ＆gt;因为我需要对BuildingName进行n-gram分析，并且字段折叠/聚合仅适用于未分析的字段。

有任何方法可以实现这一目标吗？所有帮助赞赏！谢谢！

Answer 1

向BuildingName字段添加一个子字段，该字段应为not_analyzed或使用keyword之类的分析器进行分析，该分析器不应更改文本。当您搜索nGram-ed的普通BuildingName字段时，将对未更改的子字段执行聚合：

映射：

  "mappings": {
    "test": {
      "properties": {
        "BuildingName": {
          "type": "string",
          "analyzer": "my_ngram_analyzer",
          "fields": {
            "notAnalyzed": {
              "type": "string",
              "index": "not_analyzed"
            }
          }
        }
      }
    }
  }

查询：

{
  "query": {
    "term": {
      "BuildingName": {
        "value": "ab"
      }
    }
  },
  "aggs": {
    "unique": {
      "terms": {
        "field": "BuildingName.notAnalyzed",
        "size": 10
      },
      "aggs": {
        "sample": {
          "top_hits": {
            "size": 1
          }
        }
      }
    }
  }
}

Elasticsearch：从已分析字段

1 个答案: