这是一些包含30个事件的种子数据。对于user_id
中的两个可能值中的每一个,event
有5个值,并且每个值在3个可能的时间内重复。
> SELECT time,user_id,event,score_value FROM user_scores
name: user_scores
time user_id event score_value
---- ------- ----- -----------
1517616000000000000 456 card_comment_created 10
1517616000000000000 123 card_comment_created 5
1517616000000000000 123 card_created 5
1517616000000000000 456 card_created 10
1517616000000000000 456 card_liked 10
1517616000000000000 123 card_liked 5
1517616000000000000 123 card_marked_as_complete 5
1517616000000000000 456 card_marked_as_complete 10
1517616000000000000 123 card_viewed 5
1517616000000000000 456 card_viewed 10
1517702400000000000 456 card_comment_created 10
1517702400000000000 123 card_comment_created 5
1517702400000000000 123 card_created 5
1517702400000000000 456 card_created 10
1517702400000000000 456 card_liked 10
1517702400000000000 123 card_liked 5
1517702400000000000 456 card_marked_as_complete 10
1517702400000000000 123 card_marked_as_complete 5
1517702400000000000 123 card_viewed 5
1517702400000000000 456 card_viewed 10
1517788800000000000 456 card_comment_created 10
1517788800000000000 123 card_comment_created 5
1517788800000000000 123 card_created 5
1517788800000000000 456 card_created 10
1517788800000000000 456 card_liked 10
1517788800000000000 123 card_liked 5
1517788800000000000 456 card_marked_as_complete 10
1517788800000000000 123 card_marked_as_complete 5
1517788800000000000 123 card_viewed 5
1517788800000000000 456 card_viewed 10
>
我使用以下查询将数据下采样到每日聚合中:
SELECT \
user_id,total_user_score,smartbites_commented_count,\
smartbites_completed_count,smartbites_consumed_count,\
smartbites_liked_count \
INTO user_scores_daily \
FROM ( \
SELECT SUM(score_value) AS total_user_score \
FROM user_scores \
GROUP BY time(1d),user_id \
),( \
SELECT COUNT(score_value) AS smartbites_commented_count \
FROM user_scores \
WHERE event='card_comment_created' \
GROUP BY time(1d),user_id \
),( \
SELECT COUNT(score_value) AS smartbites_completed_count \
FROM user_scores \
WHERE event='card_marked_as_complete' \
GROUP BY time(1d),user_id \
),( \
SELECT COUNT(score_value) AS smartbites_consumed_count \
INTO smartbites_consumed_counts_daily \
FROM user_scores \
WHERE event='card_viewed' \
GROUP BY time(1d),user_id \
),( \
SELECT COUNT(score_value) AS smartbites_liked_count \
FROM user_scores \
WHERE event='card_liked' \
GROUP BY time(1d),user_id \
)
注意每个子查询是如何按time(1d)
和user_id
进行分组的。我需要在每个用户/天组合的结果中添加一行。
这是我的结果:
> SELECT * FROM user_scores_daily
name: user_scores_daily
time smartbites_commented_count smartbites_completed_count smartbites_consumed_count smartbites_liked_count total_user_score user_id
---- -------------------------- -------------------------- ------------------------- ---------------------- ---------------- -------
1517616000000000000 1 1 1 1 50 456
1517702400000000000 1 1 1 1 50 456
1517788800000000000 1 1 1 1 50 456
其中一位用户的数据看起来很完美。但是第二个用户呢?总共应该有六行,但只有三行。缺少三行,其中user_id = 123。
编辑以回应评论:
> SHOW TAG KEYS FROM "user_scores"
name: user_scores
tagKey
------
actor_id
analytics_version
event
owner_id
role
user_id
> SHOW FIELD KEYS FROM "user_scores"
name: user_scores
fieldKey fieldType
-------- ---------
score_value integer
>
答案 0 :(得分:0)
我最终做的是将GROUP BY user_id,time(1d)
添加到我的顶级查询(在子查询之后),并将最外层SELECT选择的字段更改为聚合。
这些聚合是多余的,但如果我要在顶级查询中使用GROUP BY
,我需要使用它们。
代码如下:
SELECT \
MEAN(user_id) as user_id,\
MEAN(total_user_score) as total_user_score,\
MEAN(smartbites_commented_count) as smartbites_commented_count,\
MEAN(smartbites_consumed_count) as smartbites_consumed_count,\
MEAN(smartbites_liked_count) as smartbites_liked_count,\
INTO user_scores_daily \
FROM ( \
SELECT SUM(score_value) AS total_user_score \
FROM user_scores \
GROUP BY time(1d),user_id \
),( \
SELECT COUNT(score_value) AS smartbites_commented_count \
FROM user_scores \
WHERE event='card_comment_created' \
GROUP BY time(1d),user_id \
),( \
SELECT COUNT(score_value) AS smartbites_completed_count \
FROM user_scores \
WHERE event='card_marked_as_complete' \
GROUP BY time(1d),user_id \
),( \
SELECT COUNT(score_value) AS smartbites_consumed_count \
INTO smartbites_consumed_counts_daily \
FROM user_scores \
WHERE event='card_viewed' \
GROUP BY time(1d),user_id \
),( \
SELECT COUNT(score_value) AS smartbites_liked_count \
FROM user_scores \
WHERE event='card_liked' \
GROUP BY time(1d),user_id \
) \
GROUP BY time(1d),user_id
这是一个MEAN查询,如果我可以自己说(我会展示自己)