高级SQL选择查询

时间:2011-07-27 00:49:24

标签: mysql sql select join

week      cookie
1         a
1         b
1         c
1         d
2         a 
2         b
3         a
3         c
3         d

此表代表某人在特定周内访问某个网站。每个Cookie代表一个人。每个条目代表某人在特定的一周内访问此网站。例如,最后一个条目意味着'd'在第3周来到网站。

我想知道有多少(相同)的人在接下来的一周内继续回来,这是一个开始的一周。

例如,如果我查看第1周,我会得到如下结果:

1 | 4
2 | 2
3 | 1

因为第4周有4位用户进入,所以在第2周只有2位用户(a,b)回来了。这3周内只有1位(a)进入。

如何查找选择查询?表格很大:可能有100周,所以我想找到正确的方法。

6 个答案:

答案 0 :(得分:3)

此查询使用变量来跟踪相邻的周数,如果它们是连续的,则计算出来:

set @start_week = 2, @week := 0, @conseq := 0, @cookie:='';
select conseq_weeks, count(*)
from (
select 
  cookie,
  if (cookie != @cookie or week != @week + 1, @conseq := 0, @conseq := @conseq + 1) + 1 as conseq_weeks,
  (cookie != @cookie and week <= @start_week) or (cookie = @cookie and week = @week + 1) as conseq,
  @cookie := cookie as lastcookie,
  @week := week as lastweek
from (select week, cookie from webhist where week >= @start_week order by 2, 1) x
) y
where conseq
group by 1;

这是第2周。再过一周,更改顶部的start_week变量。

以下是测试:

create table webhist(week int, cookie char);
insert into webhist values (1, 'a'), (1, 'b'), (1, 'c'), (1, 'd'), (2, 'a'), (2, 'b'), (3, 'a'), (3, 'c'), (3, 'd');

使用where week >= 1输出上述查询:

+--------------+----------+
| conseq_weeks | count(*) |
+--------------+----------+
|            1 |        4 |
|            2 |        2 |
|            3 |        1 |
+--------------+----------+

使用where week >= 2输出上述查询:

+--------------+----------+
| conseq_weeks | count(*) |
+--------------+----------+
|            1 |        2 |
|            2 |        1 |
+--------------+----------+

P.S。好问题,但有点破球机

答案 1 :(得分:2)

由于某些原因,这些答案中的大多数都非常复杂,它不需要游标或循环或任何类似的东西......

  

我想知道有多少(相同)的人不断回来   接下来的一周,开始一周的时间来看看。

如果您想知道一周内访问过的任何一周的用户数,以及每个未来一周的一周后的用户数:

SELECT visits.week, COUNT(1) AS [NumRepeatUsers]
FROM visits 
WHERE EXISTS (
    SELECT TOP 1 1 
    FROM visits AS nextWeek 
    WHERE nextWeek.week = visits.week+1 
      AND nextWeek.cookie = visits.cookie
  )
  AND EXISTS (
    SELECT TOP 1 1 
    FROM visits AS searchWeek
    WHERE searchWeek.week = @week 
      AND nextWeek.cookie = visits.cookie
  )
GROUP BY visits.week
ORDER BY visits.week

然而,如果您在第1周有10个用户,然后在接下来的5周内访问了5个不同的用户,您将看到1 = 10,2 = 5,3 = 5,4,这不会显示您的结果会逐渐减少= 5,5 = 5,6 = 5,依此类推,您希望看到5 = x,其中x是每周访问5周的用户数。为此,请参阅以下内容:

SELECT visits.week, COUNT(1) AS [NumRepeatUsers]
FROM visits 
WHERE EXISTS (
    SELECT TOP 1 1 
    FROM visits AS nextWeek 
    WHERE nextWeek.week = visits.week+1 
      AND nextWeek.cookie = visits.cookie
  )
  AND EXISTS (
    SELECT TOP 1 1 
    FROM visits AS searchWeek
    WHERE searchWeek.week = @week 
      AND nextWeek.cookie = visits.cookie
  )
  AND visits.week - @week = (
    SELECT COUNT(1) AS [Count]
    FROM visits AS searchWeek
    WHERE searchWeek.week BETWEEN @week+1 AND visits.week
      AND nextWeek.cookie = visits.cookie
  )
GROUP BY visits.week
ORDER BY visits.week

这将给你1 = 10,2 = 5,3 = 4,4 = 3,5 = 2,6 = 1等

答案 2 :(得分:2)

这是一个有趣的问题。

我试着找出每个人访问的最后一周的时间 这是在下一周没有访问的开始时或之后的第一周计算的。

一旦您了解每个用户的最终访问周,您就可以计算每周最后一次访问所在或之后的不同用户的数量。

SELECT wks.week, COUNT(cookie) as Visitors
FROM (SELECT a.cookie, MIN(a.week) AS FinalVisit
      FROM WeekVisits a 
           INNER JOIN WeekVisits FirstWeek
           ON a.cookie = FirstWeek.cookie
      WHERE a.week >= 1
        AND FirstWeek.week = 1
        AND NOT EXISTS (SELECT 1 
                        FROM WeekVisits b
                        WHERE b.week = a.week + 1
                          AND b.cookie = a.cookie)
      GROUP BY a.cookie) fv
     INNER JOIN
     (SELECT DISTINCT week 
      FROM WeekVisits
      WHERE week >= 1) wks
     ON fv.FinalVisit >= wks.week 
GROUP BY wks.week
ORDER BY wks.week

修改
- 感谢ypercube注意到。我也从“fv”查询中丢失了该组。哎呀。
- 我删除了表示参数的注释 - 我删除了不必要的不​​同 再次编辑 - 为FirstWeek添加了额外的东西,因为它没有应对第2周的开始

当我运行它时(不可否认在MS Access上)

从第1周开始,我得到:

+------+----------+
| week | Visitors |
|  1   |   4      |
|  2   |   2      |
|  3   |   1      |
+------+----------+

从第2周开始我得到:

+------+----------+
| week | Visitors |
|  2   |   2      |
|  3   |   1      |
+------+----------+

..如预期的那样 (要从第2周开始,您将在与周列比较的三个位置将1更改为2) 该方法看似合理,但语法可能需要调整MySQL。

答案 3 :(得分:0)

好的,我们假设您的表名为visits,您对第n周感兴趣。您想知道,每个周数w >= n,哪些用户出现在每个单周w

那么有多少这样的周?

select count(*)
from visits
where week >= n;

每个用户访问了多少个星期?

select user, count(user)
from visit
group by user
where week >= n;

假设你有第1,3,4,5,6,7,9,10和13周,并且你对第5周感兴趣。所以上面的第一个查询给你6,因为有6周的兴趣:5,6,7,9,10和13.第二个查询将为每个用户提供他们访问过的周数。现在您想知道这些用户中有多少是6。

我认为这有效:

select user, count(user)
from visit
group by user
having count(user) = (
    select count(*)
    from visits
    where week >= n)
where week >= n;

但我现在无法访问MySQL。如果它不起作用,那么这种方法可能会有所帮助并使你朝着正确的方向前进。编辑:我明天可以测试。

答案 4 :(得分:0)

使用自我加入:

SELECT ... FROM visits AS v1 LEFT JOIN visits AS v2 ON v2.week = v1.week+1
WHERE v2.week IS NOT NULL
GROUP BY cookie

这将为您提供第二次及以后的访问记录。

但我认为更好的只是GROUP BY cookie可以获得每个Cookie的访问次数;任何高于1的数字都是返回用户。

答案 5 :(得分:0)

这是我的解决方案,不是很简单但是 - 我已经测试过 - 它确实解决了你的问题:

首先,我们声明一个存储过程,它将给我们一个特定周的访问者以字符串分隔,如果你愿意,可以使用group_concat,但我这样做 - 考虑到group_concat有文本限制。

DELIMITER $$

DROP PROCEDURE IF EXISTS `db`.`get_visitors_for_week`$$

CREATE DEFINER=`root`@`localhost` PROCEDURE `get_visitors_for_week`(id_week INTEGER, OUT result TEXT)
BEGIN
    DECLARE should_continue INT DEFAULT 0;
    DECLARE c_cookie CHAR(1);
    DECLARE r CURSOR FOR SELECT v.cookie
                FROM visits v WHERE v.week = id_week;
    DECLARE CONTINUE HANDLER FOR NOT FOUND
        SET should_continue = 1;
    OPEN r;
    REPEAT
        SET c_cookie = NULL;
        FETCH r INTO c_cookie;
        IF c_cookie IS NOT NULL THEN
            IF result IS NULL OR result = '' THEN
                SET result = c_cookie;
            ELSE SET result = CONCAT(result,',',c_cookie);
            END IF;
        END IF;
        UNTIL should_continue = 1
    END REPEAT;
    CLOSE r;
    END$$

DELIMITER ;

然后我们声明一个函数来包装该存储过程,因此我们可以方便地调用查询内部:

DELIMITER $$

DROP FUNCTION IF EXISTS `db`.`concat_values`$$

CREATE DEFINER=`root`@`localhost` FUNCTION `concat_values`(id_week INTEGER) RETURNS TEXT CHARSET latin1
BEGIN
    DECLARE result TEXT;
    CALL get_visitors_for_week(id_week, result);
    RETURN result;
    END$$

DELIMITER ;

然后我们必须统计本周和上周来的访客 - 当然每个星期 - 我们通过在连锁列表中搜索我们的cookie字符串来“看到”。这是最终查询:

SELECT
  v.week,
  SUM(IF(IFNULL(concat_values(v.week - 1)) OR INSTR(concat_values(v.week - 1),v.cookie) > 0, 1, 0)) AS Visitors
FROM (SELECT
        v.week,
        v.cookie,
        vt.visitors
      FROM visits v
        INNER JOIN (SELECT DISTINCT
                      v.week,
                      concat_values(v.week) AS visitors
                    FROM visits v) AS vt
          ON v.week = vt.week) AS v
WHERE v.week >= 1
GROUP BY v.week

将条件v.week >= 1 - 1-替换为您想要开始的周数。