Question

我有一个带有两个字符串字段constexpr和#define的索引映射，两者都被声明为copy_to到另一个名为field1的字段。 field2被编入索引为“not_analyzed”。

当我在all_fields上创建一个桶聚合时，我期待不同的桶，其中field1和field2的键连接在一起。相反，我得到了单独的桶，其中field1和field2的键是非连接的。

实施例：映射：

all_fields

数据：

all_fields

和

  {
    "mappings": {
      "myobject": {
        "properties": {
          "field1": {
            "type": "string",
            "index": "analyzed",
            "copy_to": "all_fields"
          },
          "field2": {
            "type": "string",
            "index": "analyzed",
            "copy_to": "all_fields"
          },
          "all_fields": {
            "type": "string",
            "index": "not_analyzed"
          }
        }
      }
    }
  }

聚合：

  {
    "field1": "dinner carrot potato broccoli",
    "field2": "something here",
  }

结果：

  {
    "field1": "fish chicken something",
    "field2": "dinner",
  }

我只期待2个存储桶，{ "aggs": { "t": { "terms": { "field": "all_fields" } } } }和... "aggregations": { "t": { "doc_count_error_upper_bound": 0, "sum_other_doc_count": 0, "buckets": [ { "key": "dinner", "doc_count": 1 }, { "key": "dinner carrot potato broccoli", "doc_count": 1 }, { "key": "fish chicken something", "doc_count": 1 }, { "key": "something here", "doc_count": 1 } ] } }

我做错了什么？

Answer 1

您正在寻找的是两个字符串的连接。 copy_to即使看起来这样做，也不是。对于copy_to，您在概念上是从field1和field2创建一组值，而不是连接它们。

对于您的用例，您有两种选择：

使用_source transformation
执行脚本聚合

我建议_source转换，因为我认为它比编写脚本效率更高。这意味着，您在索引时付出的代价要比执行繁重的脚本聚合花费一点。

_source转化：

PUT /lastseen
{
  "mappings": {
    "test": {
      "transform": {
        "script": "ctx._source['all_fields'] = ctx._source['field1'] + ' ' + ctx._source['field2']"
      }, 
      "properties": {
        "field1": {
          "type": "string"
        },
        "field2": {
          "type": "string"
        },
        "lastseen": {
          "type": "long"
        },
        "all_fields": {
          "type": "string",
          "index": "not_analyzed"
        }
      }
    }
  }
}

查询：

GET /lastseen/test/_search
{
  "aggs": {
    "NAME": {
      "terms": {
        "field": "all_fields",
        "size": 10
      }
    }
  }
}

对于脚本聚合，更容易做（意思是，使用doc['field'].value而不是更昂贵的_source.field）将.raw子字段添加到{ {1}}和field1：

field2

脚本将使用这些PUT /lastseen { "mappings": { "test": { "properties": { "field1": { "type": "string", "fields": { "raw": { "type": "string", "index": "not_analyzed" } } }, "field2": { "type": "string", "fields": { "raw": { "type": "string", "index": "not_analyzed" } } }, "lastseen": { "type": "long" } } } } }子字段：

.raw

如果没有{ "aggs": { "NAME": { "terms": { "script": "doc['field1.raw'].value + ' ' + doc['field2.raw'].value", "size": 10, "lang": "groovy" } } } }子字段（有意为.raw制作），您可能需要执行此类操作，这样做会更昂贵：

not_analyzed

elasticsearch copy_to字段与聚合的行为不符合预期

1 个答案: