希望对用户群进行一些群组分析。我们有2个桌子"用户"和"会话",其中用户和会话都有" created_at"领域。我正在制定一个查询,该查询产生一个7乘7的数字表(带有一些空格),向我显示:在特定日期创建的用户数也创建了会话y =(0。 .6天前),表明他当天回来了。
created_at d2 d3 d4
today * * *
today-1 49 * *
today-2 45 30 *
today-3 47 48 18
...
在这种情况下,今天创建的47个用户 - 3今天返回。
我可以在单个MySQL查询中执行此操作吗?我可以像这样单独执行查询,但在一个查询中完成所有查询真的很不错。
SELECT `users`.* FROM `users` INNER JOIN `sessions` ON `sessions`.`user_id` = `users`.`id` WHERE `users`.`os` = 'ios' AND (`sessions`.`updated_at` BETWEEN '2013-01-16 08:00:00' AND '2013-01-17 08:00:00')
答案 0 :(得分:17)
这似乎是一个复杂的问题。无论你认为它是否也是一个困难的人,从一个较小的问题开始解决这个问题绝不是一个坏主意。
例如,您可以根据您的要求,通过查询返回已在上周注册的所有用户(仅限用户),即从现在开始的六天开始:
SELECT *
FROM users
WHERE created_at >= CURDATE() - INTERVAL 6 DAY
下一步可能是按日期对结果进行分组,并计算每组中的行数:
SELECT
created_at,
COUNT(*) AS user_count
FROM users
WHERE created_at >= CURDATE() - INTERVAL 6 DAY
GROUP BY created_at
如果created_at
是datetime
或timestamp
,请使用DATE(created_at)
作为分组标准:
SELECT
DATE(created_at) AS created_at,
COUNT(*) AS user_count
FROM users
WHERE created_at >= CURDATE() - INTERVAL 6 DAY
GROUP BY DATE(created_at)
但是,您似乎不希望输出中包含绝对日期,而只需要相对日期,例如today
,today - 1 day
等在这种情况下,你可以使用DATEDIFF()
函数,它返回两个日期之间的天数,从今天产生(数字)偏移量,并按这些值分组:
SELECT
DATEDIFF(CURDATE(), created_at) AS created_at,
COUNT(*) AS user_count
FROM users
WHERE created_at >= CURDATE() - INTERVAL 6 DAY
GROUP BY DATE(created_at)
您的created_at
列会包含“{1}},0
之类的”日期“等等,直至1
。将它们转换为6
,today
等等是微不足道的,您将在最终查询中看到它。然而,到目前为止,我们已经达到了我们需要退后一步的程度(或者,它可能是向右迈出的一半),因为我们并不需要计算用户,而是他们的< EM>返回。因此,目前所需的today-1
实际工作数据集将是:
users
我们需要用户ID将此行集加入(将派生的行集)SELECT
id,
DATEDIFF(CURDATE(), created_at) AS day_offset
FROM users
WHERE created_at >= CURDATE() - INTERVAL 6 DAY
,我们需要sessions
作为分组标准。
接下来,需要在day_offset
表上执行类似的转换,我不会详细介绍。我只想说结果查询与上一个查询非常相同,只有两个例外:
sessions
被id
;
DISTINCT适用于整个子集。
DISTINCT的原因是每个用户返回不超过一行&amp; day:我的理解是,无论用户在特定日期可能有多少会话,您都希望将它们计为一个返回。所以,这是从user_id
得到的:
sessions
现在只剩下加入两个派生表,应用分组并使用条件聚合来获得所需的结果:
SELECT DISTINCT
user_id,
DATEDIFF(CURDATE(), created_at) AS day_offset
FROM sessions
WHERE created_at >= CURDATE() - INTERVAL 6 DAY
我必须承认我没有对此进行测试/调试,但是,如果需要,我将很乐意使用您提供的数据样本。 :)
答案 1 :(得分:3)
每月明智队列示例:
首先让我们创建表个人用户活动流(MONTH WISE):
SELECT
mu.created_timestamp AS cohort
, mu.id AS user_id
,(SELECT IF(COUNT(l.order_date) = 0 , 0, 1) FROM order l WHERE MONTH(l.order_date) = 1 AND l.user_id = mu.id) AS m1
,(SELECT IF(COUNT(l.order_date) = 0 , 0, 1) FROM order l WHERE MONTH(l.order_date) = 2 AND l.user_id = mu.id) AS m2
,(SELECT IF(COUNT(l.order_date) = 0 , 0, 1) FROM order l WHERE MONTH(l.order_date) = 3 AND l.user_id = mu.id) AS m3
,(SELECT IF(COUNT(l.order_date) = 0 , 0, 1) FROM order l WHERE MONTH(l.order_date) = 4 AND l.user_id = mu.id) AS m4
,(SELECT IF(COUNT(l.order_date) = 0 , 0, 1) FROM order l WHERE MONTH(l.order_date) = 5 AND l.user_id = mu.id) AS m5
,(SELECT IF(COUNT(l.order_date) = 0 , 0, 1) FROM order l WHERE MONTH(l.order_date) = 6 AND l.user_id = mu.id) AS m6
,(SELECT IF(COUNT(l.order_date) = 0 , 0, 1) FROM order l WHERE MONTH(l.order_date) = 7 AND l.user_id = mu.id) AS m7
,(SELECT IF(COUNT(l.order_date) = 0 , 0, 1) FROM order l WHERE MONTH(l.order_date) = 8 AND l.user_id = mu.id) AS m8
,(SELECT IF(COUNT(l.order_date) = 0 , 0, 1) FROM order l WHERE MONTH(l.order_date) = 9 AND l.user_id = mu.id) AS m9
,(SELECT IF(COUNT(l.order_date) = 0 , 0, 1) FROM order l WHERE MONTH(l.order_date) = 10 AND l.user_id = mu.id) AS m10
,(SELECT IF(COUNT(l.order_date) = 0 , 0, 1) FROM order l WHERE MONTH(l.order_date) = 11 AND l.user_id = mu.id) AS m11
,(SELECT IF(COUNT(l.order_date) = 0 , 0, 1) FROM order l WHERE MONTH(l.order_date) = 12 AND l.user_id = mu.id) AS m12
FROM user mu
WHERE mu.created_timestamp BETWEEN '2018-01-01 00:00:00' AND '2019-12-31 23:59:59'
此表之后,计算用户的单个活动总和:
SELECT MONTH(c.cohort) AS cohort
,COUNT(c.user_id) AS signups
,SUM(c.m1) AS m1
,SUM(c.m2) AS m2
,SUM(c.m3) AS m3
,SUM(c.m4) AS m4
,SUM(c.m5) AS m5
,SUM(c.m6) AS m6
,SUM(c.m7) AS m7
,SUM(c.m8) AS m8
,SUM(c.m9) AS m9
,SUM(c.m10) AS m10
,SUM(c.m11) AS m11
,SUM(c.m12) AS m12
FROM (SELECT
mu.created_timestamp AS cohort
, mu.id AS user_id
,(SELECT IF(COUNT(l.order_date) = 0 , 0, 1) FROM order l WHERE MONTH(l.order_date) = 1 AND l.user_id = mu.id) AS m1
,(SELECT IF(COUNT(l.order_date) = 0 , 0, 1) FROM order l WHERE MONTH(l.order_date) = 2 AND l.user_id = mu.id) AS m2
,(SELECT IF(COUNT(l.order_date) = 0 , 0, 1) FROM order l WHERE MONTH(l.order_date) = 3 AND l.user_id = mu.id) AS m3
,(SELECT IF(COUNT(l.order_date) = 0 , 0, 1) FROM order l WHERE MONTH(l.order_date) = 4 AND l.user_id = mu.id) AS m4
,(SELECT IF(COUNT(l.order_date) = 0 , 0, 1) FROM order l WHERE MONTH(l.order_date) = 5 AND l.user_id = mu.id) AS m5
,(SELECT IF(COUNT(l.order_date) = 0 , 0, 1) FROM order l WHERE MONTH(l.order_date) = 6 AND l.user_id = mu.id) AS m6
,(SELECT IF(COUNT(l.order_date) = 0 , 0, 1) FROM order l WHERE MONTH(l.order_date) = 7 AND l.user_id = mu.id) AS m7
,(SELECT IF(COUNT(l.order_date) = 0 , 0, 1) FROM order l WHERE MONTH(l.order_date) = 8 AND l.user_id = mu.id) AS m8
,(SELECT IF(COUNT(l.order_date) = 0 , 0, 1) FROM order l WHERE MONTH(l.order_date) = 9 AND l.user_id = mu.id) AS m9
,(SELECT IF(COUNT(l.order_date) = 0 , 0, 1) FROM order l WHERE MONTH(l.order_date) = 10 AND l.user_id = mu.id) AS m10
,(SELECT IF(COUNT(l.order_date) = 0 , 0, 1) FROM order l WHERE MONTH(l.order_date) = 11 AND l.user_id = mu.id) AS m11
,(SELECT IF(COUNT(l.order_date) = 0 , 0, 1) FROM order l WHERE MONTH(l.order_date) = 12 AND l.user_id = mu.id) AS m12
FROM user mu
WHERE mu.created_timestamp BETWEEN '2018-01-01 00:00:00' AND '2019-12-31 23:59:59') AS c GROUP BY MONTH(cohort)
要取代数月,您可以使用数天,而其他明智的队列分析通常用在月数情况下
答案 2 :(得分:1)
这个答案反转了@Newy想要的输出表,因此队列是行而不是列,并使用绝对日期而不是相对日期。
我正在寻找一个可以给我这样的查询:
Date d0 d1 d2 d3 d4 d5 d6
2016-11-03 3 1 0 0 0 0 0
2016-11-04 4 2 0 1 0 0 *
2016-11-05 7 0 1 1 0 * *
2016-11-06 7 3 1 1 * * *
2016-11-07 13 5 1 * * * *
2016-11-08 4 0 * * * * *
2016-11-09 1 * * * * * *
我一直在寻找注册某个特定日期的用户数量,然后是1天后,2天后返回的用户数量等等。所以在2016-11-07 13位用户注册并开了一个会话,然后其中5个用户在1天后回来,然后一个用户在2天后回来等等。
我拿了@Andriy M的大型查询的第一个子查询并修改它以给我用户注册的日期,而不是相对于当前日期的日期:
SELECT
id,
DATE(created_at) AS DayOffset
FROM users
WHERE created_at >= CURDATE() - INTERVAL 6 DAY
然后我修改的LEFT JOIN子查询看起来像这样:
SELECT DISTINCT
sessions.user_id,
DATEDIFF(sessions.created_at, user.created_at) AS DayOffset
FROM sessions
LEFT JOIN users ON (users.id = sessions.user_id)
WHERE sessions.created_at >= CURDATE() - INTERVAL 6 DAY
我希望dayoffset与@Andriy M的答案中的当前日期无关,而是相对于用户注册的日期。所以我确实在用户表上留下了联接以获得用户注册的时间并在其上进行了日期差异。
所以最终查询看起来像这样:
SELECT u.DayOffset as Date,
SUM(s.DayOffset = 0) AS d0,
SUM(s.DayOffset = 1) AS d1,
SUM(s.DayOffset = 2) AS d2,
SUM(s.DayOffset = 3) AS d3,
SUM(s.DayOffset = 4) AS d4,
SUM(s.DayOffset = 5) AS d5,
SUM(s.DayOffset = 6) AS d6
FROM (
SELECT
id,
DATE(created_at) AS DayOffset
FROM users
WHERE created_at >= CURDATE() - INTERVAL 6 DAY
) as u
LEFT JOIN (
SELECT DISTINCT
sessions.user_id,
DATEDIFF(sessions.created_at, user.created_at) AS DayOffset
FROM sessions
LEFT JOIN users ON (users.id = sessions.user_id)
WHERE sessions.created_at >= CURDATE() - INTERVAL 6 DAY
) as s
ON s.user = u.id
GROUP BY u.DayOffset