请观察此情景:
定义映射
PUT /my_index
{
"mappings": {
"my_type": {
"properties": {
"city": {
"type": "string",
"fields": {
"raw": {
"type": "string",
"index": "not_analyzed"
}
}
}
}
}
}
}
添加数据
PUT /my_index/my_type/1
{
"city": "New York"
}
PUT /my_index/my_type/2
{
"city": "York"
}
PUT /my_index/my_type/3
{
"city": "york"
}
查询构面
GET /my_index/_search
{
"size": 0,
"aggs": {
"Cities": {
"terms": {
"field": "city.raw"
}
}
}
}
结果
{
...
"aggregations": {
"Cities": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "New York",
"doc_count": 1
},
{
"key": "York",
"doc_count": 1
},
{
"key": "york",
"doc_count": 1
}
]
}
}
}
困境
我想要2件事:
梦想结果
{
...
"aggregations": {
"Cities": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "New York",
"doc_count": 1
},
{
"key": "York",
"doc_count": 2
}
]
}
}
}
答案 0 :(得分:1)
这将使您的客户端代码稍微复杂一些,但您可以随时执行此类操作。
设置索引时,附加的子字段只是较低的(不在空格上分割):
PUT /my_index
{
"settings": {
"analysis": {
"analyzer": {
"lowercase_analyzer": {
"type": "custom",
"tokenizer": "keyword",
"filter": [
"lowercase"
]
}
}
}
},
"mappings": {
"my_type": {
"properties": {
"city": {
"type": "string",
"fields": {
"lowercase": {
"type": "string",
"analyzer": "lowercase_analyzer"
},
"raw": {
"type": "string",
"index": "not_analyzed"
}
}
}
}
}
}
}
PUT /my_index/my_type/_bulk
{"index":{"_id":1}}
{"city":"New York"}
{"index":{"_id":2}}
{"city":"York"}
{"index":{"_id":3}}
{"city":"york"}
然后使用这样的两级聚合,其中第二个按字母顺序递增(以便大写术语首先出现)并且仅返回每个小写术语的顶部原始术语:
GET /my_index/_search
{
"size": 0,
"aggs": {
"city_lowercase": {
"terms": {
"field": "city.lowercase"
},
"aggs": {
"city_terms": {
"terms": {
"field": "city.raw",
"order" : { "_term" : "asc" },
"size": 1
}
}
}
}
}
}
返回:
{
"took": 5,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 3,
"max_score": 0,
"hits": []
},
"aggregations": {
"city_lowercase": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "york",
"doc_count": 2,
"city_terms": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 1,
"buckets": [
{
"key": "York",
"doc_count": 1
}
]
}
},
{
"key": "new york",
"doc_count": 1,
"city_terms": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "New York",
"doc_count": 1
}
]
}
}
]
}
}
}
这是我使用的代码(还有一些doc示例):
http://sense.qbox.io/gist/f3781d58fbaadcc1585c30ebb087108d2752dfff