使用GROUP BY和InfluxDB子查询丢失数据

时间:2018-02-06 22:28:05

标签: influxdb

这是一些包含30个事件的种子数据。对于user_id中的两个可能值中的每一个,event有5个值,并且每个值在3个可能的时间内重复。

> SELECT time,user_id,event,score_value FROM user_scores
name: user_scores
time                user_id event                   score_value
----                ------- -----                   -----------
1517616000000000000 456     card_comment_created    10
1517616000000000000 123     card_comment_created    5
1517616000000000000 123     card_created            5
1517616000000000000 456     card_created            10
1517616000000000000 456     card_liked              10
1517616000000000000 123     card_liked              5
1517616000000000000 123     card_marked_as_complete 5
1517616000000000000 456     card_marked_as_complete 10
1517616000000000000 123     card_viewed             5
1517616000000000000 456     card_viewed             10
1517702400000000000 456     card_comment_created    10
1517702400000000000 123     card_comment_created    5
1517702400000000000 123     card_created            5
1517702400000000000 456     card_created            10
1517702400000000000 456     card_liked              10
1517702400000000000 123     card_liked              5
1517702400000000000 456     card_marked_as_complete 10
1517702400000000000 123     card_marked_as_complete 5
1517702400000000000 123     card_viewed             5
1517702400000000000 456     card_viewed             10
1517788800000000000 456     card_comment_created    10
1517788800000000000 123     card_comment_created    5
1517788800000000000 123     card_created            5
1517788800000000000 456     card_created            10
1517788800000000000 456     card_liked              10
1517788800000000000 123     card_liked              5
1517788800000000000 456     card_marked_as_complete 10
1517788800000000000 123     card_marked_as_complete 5
1517788800000000000 123     card_viewed             5
1517788800000000000 456     card_viewed             10
>

我使用以下查询将数据下采样到每日聚合中:

    SELECT \
      user_id,total_user_score,smartbites_commented_count,\
      smartbites_completed_count,smartbites_consumed_count,\
      smartbites_liked_count \
   INTO user_scores_daily \
   FROM ( \
     SELECT SUM(score_value) AS total_user_score \
     FROM user_scores \
     GROUP BY time(1d),user_id \
   ),( \
     SELECT COUNT(score_value) AS smartbites_commented_count \
     FROM user_scores \
     WHERE event='card_comment_created' \
     GROUP BY time(1d),user_id \
   ),( \
     SELECT COUNT(score_value) AS smartbites_completed_count \
     FROM user_scores \
     WHERE event='card_marked_as_complete' \
     GROUP BY time(1d),user_id \
   ),( \
     SELECT COUNT(score_value) AS smartbites_consumed_count \
     INTO smartbites_consumed_counts_daily \
     FROM user_scores \
     WHERE event='card_viewed' \
     GROUP BY time(1d),user_id \
   ),( \
     SELECT COUNT(score_value) AS smartbites_liked_count \
     FROM user_scores \
     WHERE event='card_liked' \
     GROUP BY time(1d),user_id \
   )

注意每个子查询是如何按time(1d)user_id进行分组的。我需要在每个用户/天组合的结果中添加一行。

这是我的结果:

> SELECT * FROM user_scores_daily
name: user_scores_daily
time                smartbites_commented_count smartbites_completed_count smartbites_consumed_count smartbites_liked_count total_user_score user_id
----                -------------------------- -------------------------- ------------------------- ---------------------- ---------------- -------
1517616000000000000 1                          1                          1                         1                      50               456
1517702400000000000 1                          1                          1                         1                      50               456
1517788800000000000 1                          1                          1                         1                      50               456

其中一位用户的数据看起来很完美。但是第二个用户呢?总共应该有六行,但只有三行。缺少三行,其中user_id = 123。

编辑以回应评论:

> SHOW TAG KEYS FROM "user_scores"
name: user_scores
tagKey
------
actor_id
analytics_version
event
owner_id
role
user_id
> SHOW FIELD KEYS FROM "user_scores"
name: user_scores
fieldKey    fieldType
--------    ---------
score_value integer
>

1 个答案:

答案 0 :(得分:0)

我最终做的是将GROUP BY user_id,time(1d)添加到我的顶级查询(在子查询之后),并将最外层SELECT选择的字段更改为聚合。

这些聚合是多余的,但如果我要在顶级查询中使用GROUP BY,我需要使用它们。

代码如下:

   SELECT \
     MEAN(user_id) as user_id,\
     MEAN(total_user_score) as total_user_score,\
     MEAN(smartbites_commented_count) as smartbites_commented_count,\
     MEAN(smartbites_consumed_count) as smartbites_consumed_count,\
     MEAN(smartbites_liked_count) as smartbites_liked_count,\
   INTO user_scores_daily \
   FROM ( \
     SELECT SUM(score_value) AS total_user_score \
     FROM user_scores \
     GROUP BY time(1d),user_id \
   ),( \
     SELECT COUNT(score_value) AS smartbites_commented_count \
     FROM user_scores \
     WHERE event='card_comment_created' \
     GROUP BY time(1d),user_id \
   ),( \
     SELECT COUNT(score_value) AS smartbites_completed_count \
     FROM user_scores \
     WHERE event='card_marked_as_complete' \
     GROUP BY time(1d),user_id \
   ),( \
     SELECT COUNT(score_value) AS smartbites_consumed_count \
     INTO smartbites_consumed_counts_daily \
     FROM user_scores \
     WHERE event='card_viewed' \
     GROUP BY time(1d),user_id \
   ),( \
     SELECT COUNT(score_value) AS smartbites_liked_count \
     FROM user_scores \
     WHERE event='card_liked' \
     GROUP BY time(1d),user_id \
   ) \
   GROUP BY time(1d),user_id

这是一个MEAN查询,如果我可以自己说(我会展示自己)