Elasticsearch中的聚合切割字符串而不是占用所有内容

时间:2016-09-12 10:41:16

标签: elasticsearch

具有以下简单映射:

curl -XPUT localhost:9200/transaciones/ -d '{
    "mappings": {
        "ventas": {
            "properties": {
                "tipo": { "type": "string" },
                "cantidad": { "type": "double" }
            }
        }
    }
}'

添加数据:

curl -XPUT localhost:9200/transaciones/ventas/1 -d '{
    "tipo": "Ingreso bancario",
    "cantidad": 80
}'

curl -XPUT localhost:9200/transaciones/ventas/2 -d '{
    "tipo": "Ingreso bancario",
    "cantidad": 10
}'

curl -XPUT localhost:9200/transaciones/ventas/3 -d '{
    "tipo": "PayPal",
    "cantidad": 30
}'

curl -XPUT localhost:9200/transaciones/ventas/4 -d '{
    "tipo": "Tarjeta de credito",
    "cantidad": 130
}'

curl -XPUT localhost:9200/transaciones/ventas/5 -d '{
    "tipo": "Tarjeta de credito",
    "cantidad": 130
}'

当我尝试使用aggs时:

curl -XGET localhost:9200/transaciones/ventas/_search?pretty=true -d '{
    "size": 0,
    "aggs": {
        "tipos_de_venta": {
            "terms": {
                "field": "tipo"
            }
        }
    }
}'

回复是:

  "took" : 15,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "failed" : 0
  },
  "hits" : {
    "total" : 5,
    "max_score" : 0.0,
    "hits" : [ ]
  },
  "aggregations" : {
    "tipos_de_venta" : {
      "doc_count_error_upper_bound" : 0,
      "sum_other_doc_count" : 0,
      "buckets" : [ {
        "key" : "bancario",
        "doc_count" : 2
      }, {
        "key" : "credito",
        "doc_count" : 2
      }, {
        "key" : "de",
        "doc_count" : 2
      }, {
        "key" : "ingreso",
        "doc_count" : 2
      }, {
        "key" : "tarjeta",
        "doc_count" : 2
      }, {
        "key" : "paypal",
        "doc_count" : 1
      } ]
    }
  }
}

正如您所看到的,它将字符串Tarjeta de credito切换为Tarjetadecredit。 如何在不使用not_analyzed上的映射tipo的情况下获取整个字符串?我想要的输出是Ingreso bancarioPayPalTarjeta de crédito,响应将是这样的:

 "aggregations" : {
    "tipos_de_venta" : {
      "doc_count_error_upper_bound" : 0,
      "sum_other_doc_count" : 0,
      "buckets" : [ {
        "key" : "Ingreso bancario",
        "doc_count" : 2
      }, {
        "key" : "PayPal",
        "doc_count" : 1
      }, {
        "key" : "Tarjeta de credito",
        "doc_count" : 2
      } ]
    }
  }

PS:我使用的是ES 2.3.2

1 个答案:

答案 0 :(得分:1)

这是因为您的tipo字段是分析字符串。正确的方法是创建一个not_analyzed字段,以实现您想要的目标:

curl -XPUT localhost:9200/transaciones/_mapping/ventas -d '{
    "properties": {
        "tipo": { 
           "type": "string",
           "fields": {
               "raw": {
                   "type": "string",
                   "index": "not_analyzed"
               }
           }
        }
    }
}'

然后您需要重新索引文档,最后您将能够运行此文档并获得所需的结果:

curl -XGET localhost:9200/transaciones/ventas/_search?pretty=true -d '{
    "size": 0,
    "aggs": {
        "tipos_de_venta": {
            "terms": {
                "field": "tipo.raw"
            }
        }
    }
}'

<强>更新

如果您确实不想创建not_analyzed字段,那么您可以采用另一种方式使用script terms聚合,但它可以杀死性能您的群集

curl -XGET localhost:9200/transaciones/ventas/_search?pretty=true -d '{
    "size": 0,
    "aggs": {
        "tipos_de_venta": {
            "terms": {
                "script": _source.tipo"
            }
        }
    }
}'