Question

如果索引包含具有品牌属性的文档，我们需要创建一个不区分大小写的术语聚合。

索引定义

请注意使用 fielddata

PUT demo_products
{
  "settings": {
    "analysis": {
      "analyzer": {
        "my_custom_analyzer": {
          "type": "custom",
          "tokenizer": "keyword",
          "filter": [
            "lowercase"
          ]
        }
      }
    }
  },
  "mappings": {
    "product": {
      "properties": {
        "brand": {
          "type": "text",
          "analyzer": "my_custom_analyzer",
          "fielddata": true,
        }
      }
    }
  }
}

数据

POST demo_products/product
{
  "brand": "New York Jets"
}

POST demo_products/product
{
  "brand": "new york jets"
}

POST demo_products/product
{
  "brand": "Washington Redskins"
}

查询

GET demo_products/product/_search
{
  "size": 0,
  "aggs": {
    "brand_facet": {
      "terms": {
        "field": "brand"
      }
    }
  }
}

结果

"aggregations": {
    "brand_facet": {
      "doc_count_error_upper_bound": 0,
      "sum_other_doc_count": 0,
      "buckets": [
        {
          "key": "new york jets",
          "doc_count": 2
        },
        {
          "key": "washington redskins",
          "doc_count": 1
        }
      ]
    }
  }

如果我们使用keyword代替text，由于套管的不同，我们最终会为纽约喷射机队提供2个铲斗。

我们通过使用fielddata来关注性能影响。但是，如果禁用fielddata，我们会得到可怕的＆＃34;默认情况下，文本字段会禁用Fielddata。＆＃34;

解决此问题的其他任何提示 - 或者我们不应该如此关注fielddate？

Answer 1

从ES 5.2开始（今天发布），您可以将normalizers与keyword字段一起使用，以便（例如）小写该值。

规范化器的作用有点像text字段的分析器，虽然你可以用它们做的更加克制，但这可能有助于你所面临的问题。

您可以像这样创建索引：

PUT demo_products
{
  "settings": {
    "analysis": {
      "normalizer": {
        "my_normalizer": {
          "type": "custom",
          "filter": [ "lowercase" ]
        }
      }
    }
  },
  "mappings": {
    "product": {
      "properties": {
        "brand": {
          "type": "keyword",
          "normalizer": "my_normalizer"
        }
      }
    }
  }
}

您的查询将返回此信息：

  "aggregations" : {
    "brand_facet" : {
      "doc_count_error_upper_bound" : 0,
      "sum_other_doc_count" : 0,
      "buckets" : [
        {
          "key" : "new york jets",
          "doc_count" : 2
        },
        {
          "key" : "washington redskins",
          "doc_count" : 1
        }
      ]
    }
  }

两全其美！

Elasticsearch fielddata - 我应该使用它吗？

1 个答案: