Elasticsearch:根据调查数据生成分布表

时间:2016-04-05 15:26:58

标签: elasticsearch

我正在存储这样的调查数据:

[  
  {  
    "userid":1,
    "answers":[  
      {  
        "key":"gender",
        "value":"male"
      },
      {  
        "key":"color",
        "value":"red"
      },
      {  
        "key":"vehicle",
        "value":"car"
      }
    ]
  },
  {  
    "userid":2,
    "answers":[  
      {  
        "key":"gender",
        "value":"female"
      },
      {  
        "key":"color",
        "value":"blue"
      },
      {  
        "key":"vehicle",
        "value":"bike"
      }
    ]
  },
  ......
]

映射如下:

"users" : {
    "properties" : {
        "userid" : {
            "type" : "long"
        },
        "answers" : {
            "type" : "nested",
            "properties" : {
                "key" : {
                    "type" : "string",
                    "index" : "not_analyzed"
                },
                "value" : {
                    "type" : "string",
                    "index" : "not_analyzed"
                }
            }
        }
    }
}

这意味着很多用户对不同的问题有很多答案。我必须灵活处理所提出的问题,因此我选择了键/值样式。

现在我想找一个能给我一个性别和颜色分布表的查询。 这意味着:一个二维表,其中性别为一个轴,颜色为一个轴,显示这些字段中的所有可能术语。 我想详细了解有多少女性喜欢红色或有多少男性喜欢蓝色等等。

我尝试了很多嵌套的过滤术语聚合,但还没有成功。

有关如何构建聚合查询的任何提示都将受到赞赏......

1 个答案:

答案 0 :(得分:2)

您可以查看scripted metric aggregation

一般来说它看起来像这样:

POST documents/_search
{
  "aggs": {
    "distribution": {
      "scripted_metric": {
        "init_script": "initializations",
        "map_script": "build partial distribution for single document",
        "combine_script": "",
        "reduce_script": "summarize all partial distributions to final grid"
      }
    }
  }
}

对于您的具体情况,我可以建议以下查询

POST test/example/_search
{
  "size": 0, 
  "aggs": {
    "distribution": {
      "scripted_metric": {
                "init_script" : "_agg['variants'] = [:]",
                "map_script" : "gender='';color=''; for (answer in _source.answers){ if(answer.key=='gender'){gender=answer.value};if(answer.key=='color'){color=answer.value}  }; if(gender!='' && color!=''){key=(gender+'-'+color); _agg['variants'][key]=_agg['variants'].get(key,0)+1}", 
                "combine_script" : "return _agg.variants",
                "reduce_script" : "result=[:]; for (a in _aggs) {a.each{k,v -> result[k]=result.get(k,0)+v} }; return result;"
            }
    }
  }
}

它将返回类似这样的内容

{
  "took": 32,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "failed": 0
  },
  "hits": {
    "total": 7,
    "max_score": 0,
    "hits": []
  },
  "aggregations": {
    "distribution": {
      "value": {
        "female-yelow": 4,
        "male-red": 1,
        "female-blue": 1,
        "male-blue": 1
      }
    }
  }
}

确保服务器配置中有scripting enabled

user:/etc/elasticsearch# head elasticsearch.yml 
script.inline: true
script.indexed: true
...