Question

有没有一种简单的方法可以使用elasticsearch中的聚合来计算转换率？

我有一些事件数据，如：

{"uuid": "a92405ef-9632-44ce-9cb3-0ae83e434fe9", 
 "created_at": "2015-10-26T21:58:23.132923+00:00",
 "has_data": true, ...}

{"uuid": "4a342de5-4047-4897-8f30-f60c64def839", 
 "created_at": "2015-10-26T21:57:43.985108+00:00",
 "has_data": true, ...}

{"uuid": "47d6add8-003d-4c67-8e9f-1712999b4f15", 
 "created_at": "2015-10-26T21:51:11.062669+00:00",
 "has_data": false, ...}

{"uuid": "a92405ef-9632-44ce-9cb3-0ae83e434fe9", 
 "created_at": "2015-10-26T21:44:17.121071+00:00",
 "has_data": false, ...}

我需要计算将 has_data 标志设置为true的 uuid 的唯一计数，但它是之前（在其他文档中的时间）设置为虚假或相反。对于上面的示例，我的预期结果应为 1 。只有“a92405ef-9632-44ce-9cb3-0ae83e434fe9”在两个文档中，并且 true 和 false “has_data”。

到目前为止，我已经在“has_data”上汇总了 uuid 的大小和基数，并从此处继续前进。

"aggs": {
  "2": {
    "terms": {
      "field": "uuid",
      "size": 0,
    },
    "aggs": {
      "1": {
        "cardinality": {
          "field": "has_data"
        }
      }
    }
  }
}

但这是......假的。数以百万计的事件和成千上万的 uuid 没有好处。

我认为我应该选择scripted metric aggregation。但我不能把它包裹起来。有可能吗？有人能指出我正确的方向吗？

Answer 1

如果我正确理解你，你不能只是“反转”你发布的汇总吗？

当我创建索引（"uuid"设置为"index":"not_analyzed"）并添加了您发布的数据时，我可以运行此聚合：

POST /test_index/_search?search_type=count
{
   "aggs": {
      "has_data_terms": {
         "terms": {
            "field": "has_data"
         },
         "aggs": {
            "has_data_card": {
               "cardinality": {
                  "field": "uuid"
               }
            }
         }
      }
   }
}

返回

{
   "took": 2,
   "timed_out": false,
   "_shards": {
      "total": 1,
      "successful": 1,
      "failed": 0
   },
   "hits": {
      "total": 4,
      "max_score": 0,
      "hits": []
   },
   "aggregations": {
      "has_data_terms": {
         "doc_count_error_upper_bound": 0,
         "sum_other_doc_count": 0,
         "buckets": [
            {
               "key": "F",
               "doc_count": 2,
               "has_data_card": {
                  "value": 2
               }
            },
            {
               "key": "T",
               "doc_count": 2,
               "has_data_card": {
                  "value": 2
               }
            }
         ]
      }
   }
}

所以，只需忽略"key": "F"，"key": "T"应该为您提供所需的计数。然后得到一个完整的uuid计数，你应该能够计算出你想要的比例。专门研究这种技术工作一段时间应该是直截了当的。

以下是我用来测试它的代码：

http://sense.qbox.io/gist/993546914daf15e88ac3e1095a9dfed775b0741c

Answer 2

你的问题有我们称之为＆＃34;斗式爆炸问题的成分＆＃34; - 见http://www.slideshare.net/NoSQLmatters/entity-centric-indexing-no-sql-dublin#5

查看以实体为中心的＆＃34;＆＃34;此处提供的解决方案：https://discuss.elastic.co/t/how-can-i-use-aggregations-to-query-distinct-values-across-all-time-grouped-by-first-seen/25482

计算事件数据Elasticsearch聚合的转换率

2 个答案: