Question

如何从弹性搜索记录中获取具有给定前缀的所有 id 的所有值并使它们唯一。

记录

PUT items/1
{ "ids" :  [ "apple_A", "orange_B" ] }

PUT items/2
{ "ids" :  [ "apple_A", "apple_B" ] }

PUT items/3
{ "ids" :  [ "apple_C", "banana_A" ] }

我需要的是找到给定前缀的所有唯一 id，例如，如果输入是 apple，则 id 的输出应该是 ["apple_A", "apple_B", "apple_C"]

到目前为止我尝试过的是使用术语聚合，通过以下查询，我能够过滤掉具有给定前缀的 id 的文档，但在聚合中它将返回文档的所有 id 部分。

{
  "aggregations": {
    "filterIds": {
      "filter": {
        "bool": {
          "filter": [
            {
              "prefix": {
                "ids.keyword": {
                  "value": "apple"
                }
              }
            }
          ]
        }
      },
      "aggregations": {
        "uniqueIds": {
          "terms": {
            "field": "ids.keyword",
          }
        }
      }
    }
  }
}

如果我们将前缀输入为 apple，它会将聚合列表返回为 [ "appleA", "orange_B", "apple_B","apple_C", "banana_A"]。基本上返回所有具有匹配过滤器的 ID。

是否只获取与数组中前缀匹配的id，而不是文档数组中的所有id？

Answer 1

您可以使用 include parameter 限制返回值：

POST items/_search
{
  "size": 0,
  "aggregations": {
    "filterIds": {
      "filter": {
        "bool": {
          "filter": [
            {
              "prefix": {
                "ids.keyword": {
                  "value": "apple"
                }
              }
            }
          ]
        }
      },
      "aggregations": {
        "uniqueIds": {
          "terms": {
            "field": "ids.keyword",
            "include": "apple.*"    <--
          }
        }
      }
    }
  }
}

请检查 this other thread 处理在 include 中使用正则表达式的问题——它与您的用例非常相似。

Elasticsearch 在给定过滤器后从数组字段返回唯一字符串

1 个答案: