week cookie
1 a
1 b
1 c
1 d
2 a
2 b
3 a
3 c
3 d
此表代表某人在特定周内访问某个网站。每个Cookie代表一个人。每个条目代表某人在特定的一周内访问此网站。例如,最后一个条目意味着'd'在第3周来到网站。
我想知道有多少(相同)的人在接下来的一周内继续回来,这是一个开始的一周。
例如,如果我查看第1周,我会得到如下结果:
1 | 4
2 | 2
3 | 1
因为第4周有4位用户进入,所以在第2周只有2位用户(a,b)回来了。这3周内只有1位(a)进入。
如何查找选择查询?表格很大:可能有100周,所以我想找到正确的方法。
答案 0 :(得分:3)
此查询使用变量来跟踪相邻的周数,如果它们是连续的,则计算出来:
set @start_week = 2, @week := 0, @conseq := 0, @cookie:='';
select conseq_weeks, count(*)
from (
select
cookie,
if (cookie != @cookie or week != @week + 1, @conseq := 0, @conseq := @conseq + 1) + 1 as conseq_weeks,
(cookie != @cookie and week <= @start_week) or (cookie = @cookie and week = @week + 1) as conseq,
@cookie := cookie as lastcookie,
@week := week as lastweek
from (select week, cookie from webhist where week >= @start_week order by 2, 1) x
) y
where conseq
group by 1;
这是第2周。再过一周,更改顶部的start_week
变量。
以下是测试:
create table webhist(week int, cookie char);
insert into webhist values (1, 'a'), (1, 'b'), (1, 'c'), (1, 'd'), (2, 'a'), (2, 'b'), (3, 'a'), (3, 'c'), (3, 'd');
使用where week >= 1
输出上述查询:
+--------------+----------+
| conseq_weeks | count(*) |
+--------------+----------+
| 1 | 4 |
| 2 | 2 |
| 3 | 1 |
+--------------+----------+
使用where week >= 2
输出上述查询:
+--------------+----------+
| conseq_weeks | count(*) |
+--------------+----------+
| 1 | 2 |
| 2 | 1 |
+--------------+----------+
P.S。好问题,但有点破球机
答案 1 :(得分:2)
由于某些原因,这些答案中的大多数都非常复杂,它不需要游标或循环或任何类似的东西......
我想知道有多少(相同)的人不断回来 接下来的一周,开始一周的时间来看看。
如果您想知道一周内访问过的任何一周的用户数,以及每个未来一周的一周后的用户数:
SELECT visits.week, COUNT(1) AS [NumRepeatUsers]
FROM visits
WHERE EXISTS (
SELECT TOP 1 1
FROM visits AS nextWeek
WHERE nextWeek.week = visits.week+1
AND nextWeek.cookie = visits.cookie
)
AND EXISTS (
SELECT TOP 1 1
FROM visits AS searchWeek
WHERE searchWeek.week = @week
AND nextWeek.cookie = visits.cookie
)
GROUP BY visits.week
ORDER BY visits.week
然而,如果您在第1周有10个用户,然后在接下来的5周内访问了5个不同的用户,您将看到1 = 10,2 = 5,3 = 5,4,这不会显示您的结果会逐渐减少= 5,5 = 5,6 = 5,依此类推,您希望看到5 = x,其中x是每周访问5周的用户数。为此,请参阅以下内容:
SELECT visits.week, COUNT(1) AS [NumRepeatUsers]
FROM visits
WHERE EXISTS (
SELECT TOP 1 1
FROM visits AS nextWeek
WHERE nextWeek.week = visits.week+1
AND nextWeek.cookie = visits.cookie
)
AND EXISTS (
SELECT TOP 1 1
FROM visits AS searchWeek
WHERE searchWeek.week = @week
AND nextWeek.cookie = visits.cookie
)
AND visits.week - @week = (
SELECT COUNT(1) AS [Count]
FROM visits AS searchWeek
WHERE searchWeek.week BETWEEN @week+1 AND visits.week
AND nextWeek.cookie = visits.cookie
)
GROUP BY visits.week
ORDER BY visits.week
这将给你1 = 10,2 = 5,3 = 4,4 = 3,5 = 2,6 = 1等
答案 2 :(得分:2)
这是一个有趣的问题。
我试着找出每个人访问的最后一周的时间 这是在下一周没有访问的开始时或之后的第一周计算的。
一旦您了解每个用户的最终访问周,您就可以计算每周最后一次访问所在或之后的不同用户的数量。
SELECT wks.week, COUNT(cookie) as Visitors
FROM (SELECT a.cookie, MIN(a.week) AS FinalVisit
FROM WeekVisits a
INNER JOIN WeekVisits FirstWeek
ON a.cookie = FirstWeek.cookie
WHERE a.week >= 1
AND FirstWeek.week = 1
AND NOT EXISTS (SELECT 1
FROM WeekVisits b
WHERE b.week = a.week + 1
AND b.cookie = a.cookie)
GROUP BY a.cookie) fv
INNER JOIN
(SELECT DISTINCT week
FROM WeekVisits
WHERE week >= 1) wks
ON fv.FinalVisit >= wks.week
GROUP BY wks.week
ORDER BY wks.week
修改强>
- 感谢ypercube注意到。我也从“fv”查询中丢失了该组。哎呀。
- 我删除了表示参数的注释
- 我删除了不必要的不同
再次编辑
- 为FirstWeek添加了额外的东西,因为它没有应对第2周的开始
当我运行它时(不可否认在MS Access上)
从第1周开始,我得到:
+------+----------+ | week | Visitors | | 1 | 4 | | 2 | 2 | | 3 | 1 | +------+----------+
从第2周开始我得到:
+------+----------+ | week | Visitors | | 2 | 2 | | 3 | 1 | +------+----------+
..如预期的那样 (要从第2周开始,您将在与周列比较的三个位置将1更改为2) 该方法看似合理,但语法可能需要调整MySQL。
答案 3 :(得分:0)
好的,我们假设您的表名为visits
,您对第n
周感兴趣。您想知道,每个周数w >= n
,哪些用户出现在每个单周w
。
那么有多少这样的周?
select count(*)
from visits
where week >= n;
每个用户访问了多少个星期?
select user, count(user)
from visit
group by user
where week >= n;
假设你有第1,3,4,5,6,7,9,10和13周,并且你对第5周感兴趣。所以上面的第一个查询给你6,因为有6周的兴趣:5,6,7,9,10和13.第二个查询将为每个用户提供他们访问过的周数。现在您想知道这些用户中有多少是6。
我认为这有效:
select user, count(user)
from visit
group by user
having count(user) = (
select count(*)
from visits
where week >= n)
where week >= n;
但我现在无法访问MySQL。如果它不起作用,那么这种方法可能会有所帮助并使你朝着正确的方向前进。编辑:我明天可以测试。
答案 4 :(得分:0)
使用自我加入:
SELECT ... FROM visits AS v1 LEFT JOIN visits AS v2 ON v2.week = v1.week+1
WHERE v2.week IS NOT NULL
GROUP BY cookie
这将为您提供第二次及以后的访问记录。
但我认为更好的只是GROUP BY cookie
可以获得每个Cookie的访问次数;任何高于1的数字都是返回用户。
答案 5 :(得分:0)
这是我的解决方案,不是很简单但是 - 我已经测试过 - 它确实解决了你的问题:
首先,我们声明一个存储过程,它将给我们一个特定周的访问者以字符串分隔,如果你愿意,可以使用group_concat,但我这样做 - 考虑到group_concat有文本限制。
DELIMITER $$
DROP PROCEDURE IF EXISTS `db`.`get_visitors_for_week`$$
CREATE DEFINER=`root`@`localhost` PROCEDURE `get_visitors_for_week`(id_week INTEGER, OUT result TEXT)
BEGIN
DECLARE should_continue INT DEFAULT 0;
DECLARE c_cookie CHAR(1);
DECLARE r CURSOR FOR SELECT v.cookie
FROM visits v WHERE v.week = id_week;
DECLARE CONTINUE HANDLER FOR NOT FOUND
SET should_continue = 1;
OPEN r;
REPEAT
SET c_cookie = NULL;
FETCH r INTO c_cookie;
IF c_cookie IS NOT NULL THEN
IF result IS NULL OR result = '' THEN
SET result = c_cookie;
ELSE SET result = CONCAT(result,',',c_cookie);
END IF;
END IF;
UNTIL should_continue = 1
END REPEAT;
CLOSE r;
END$$
DELIMITER ;
然后我们声明一个函数来包装该存储过程,因此我们可以方便地调用查询内部:
DELIMITER $$
DROP FUNCTION IF EXISTS `db`.`concat_values`$$
CREATE DEFINER=`root`@`localhost` FUNCTION `concat_values`(id_week INTEGER) RETURNS TEXT CHARSET latin1
BEGIN
DECLARE result TEXT;
CALL get_visitors_for_week(id_week, result);
RETURN result;
END$$
DELIMITER ;
然后我们必须统计本周和上周来的访客 - 当然每个星期 - 我们通过在连锁列表中搜索我们的cookie字符串来“看到”。这是最终查询:
SELECT
v.week,
SUM(IF(IFNULL(concat_values(v.week - 1)) OR INSTR(concat_values(v.week - 1),v.cookie) > 0, 1, 0)) AS Visitors
FROM (SELECT
v.week,
v.cookie,
vt.visitors
FROM visits v
INNER JOIN (SELECT DISTINCT
v.week,
concat_values(v.week) AS visitors
FROM visits v) AS vt
ON v.week = vt.week) AS v
WHERE v.week >= 1
GROUP BY v.week
将条件v.week >= 1
- 1-替换为您想要开始的周数。