我正在存储这样的调查数据:
[
{
"userid":1,
"answers":[
{
"key":"gender",
"value":"male"
},
{
"key":"color",
"value":"red"
},
{
"key":"vehicle",
"value":"car"
}
]
},
{
"userid":2,
"answers":[
{
"key":"gender",
"value":"female"
},
{
"key":"color",
"value":"blue"
},
{
"key":"vehicle",
"value":"bike"
}
]
},
......
]
映射如下:
"users" : {
"properties" : {
"userid" : {
"type" : "long"
},
"answers" : {
"type" : "nested",
"properties" : {
"key" : {
"type" : "string",
"index" : "not_analyzed"
},
"value" : {
"type" : "string",
"index" : "not_analyzed"
}
}
}
}
}
这意味着很多用户对不同的问题有很多答案。我必须灵活处理所提出的问题,因此我选择了键/值样式。
现在我想找一个能给我一个性别和颜色分布表的查询。 这意味着:一个二维表,其中性别为一个轴,颜色为一个轴,显示这些字段中的所有可能术语。 我想详细了解有多少女性喜欢红色或有多少男性喜欢蓝色等等。
我尝试了很多嵌套的过滤术语聚合,但还没有成功。
有关如何构建聚合查询的任何提示都将受到赞赏......
答案 0 :(得分:2)
您可以查看scripted metric aggregation
一般来说它看起来像这样:
POST documents/_search
{
"aggs": {
"distribution": {
"scripted_metric": {
"init_script": "initializations",
"map_script": "build partial distribution for single document",
"combine_script": "",
"reduce_script": "summarize all partial distributions to final grid"
}
}
}
}
对于您的具体情况,我可以建议以下查询
POST test/example/_search
{
"size": 0,
"aggs": {
"distribution": {
"scripted_metric": {
"init_script" : "_agg['variants'] = [:]",
"map_script" : "gender='';color=''; for (answer in _source.answers){ if(answer.key=='gender'){gender=answer.value};if(answer.key=='color'){color=answer.value} }; if(gender!='' && color!=''){key=(gender+'-'+color); _agg['variants'][key]=_agg['variants'].get(key,0)+1}",
"combine_script" : "return _agg.variants",
"reduce_script" : "result=[:]; for (a in _aggs) {a.each{k,v -> result[k]=result.get(k,0)+v} }; return result;"
}
}
}
}
它将返回类似这样的内容
{
"took": 32,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 7,
"max_score": 0,
"hits": []
},
"aggregations": {
"distribution": {
"value": {
"female-yelow": 4,
"male-red": 1,
"female-blue": 1,
"male-blue": 1
}
}
}
}
确保服务器配置中有scripting enabled
user:/etc/elasticsearch# head elasticsearch.yml
script.inline: true
script.indexed: true
...