无法让MySQL找到一年内活跃但后续年份不活跃的用户

时间:2012-05-10 02:28:21

标签: mysql

我有一个MySQL表:

CREATE TABLE IF NOT EXISTS users_data (
  userid int(11) NOT NULL,
  computer varchar(30) DEFAULT NULL,
  logondate date NOT NULL
) ENGINE=MyISAM DEFAULT CHARSET=utf8;

这是一个包含大约400个唯一身份用户和20台计算机的大型表,以及登录计算机的5年用户中的大约20,000个条目。

我想创建一个摘要表,列出每台特定计算机每年唯一用户的数量,以及这些用户中有多少是新用户(即之前没有登录任何计算机的先前实例) ,以及将来没有进一步登录任何计算机的用户:

CREATE TABLE IF NOT EXISTS summary_computer_use (
  computer varchar(30) DEFAULT NULL,
  year_used date NOT NULL,
  number_of_users int(11) NOT NULL,
  number_of_new_users int(11) NOT NULL,
  number_of_terminated_users int(11) NOT NULL
) ENGINE=MyISAM DEFAULT CHARSET=utf8;

INSERT into summary_computer_use (computer, year_used)
    select computer, distinct year(logondate) from users_data;

我每年都可以获得独特的用户:

UPDATE summary_computer_use as a 
inner join (
    select computer, year(logondate) as year_used,
        count(distinct userid) as number_of_users
    from users_data
    group by computer, year(logondate)
) as b on a.computer = b.computer and 
a.year_used = b.year_used
set a.number_of_users = b.number_of_users;

但我很难过如何编写一个select语句,该语句将查找给定年份中第一次使用计算机的用户数(没有早于指定年份的登录日期)或者从不再次登录。

有什么建议吗?

2 个答案:

答案 0 :(得分:0)

就是你所追求的:

select y, count(userid) as newusers from
(
    select userid, min(year(logondate)) as y from users_data group by userid
) tmp
group by y;

答案 1 :(得分:0)

我认为这会产生你想要的摘要:

   SELECT computers.computer,
          timespan.yyyy                 AS "year_used",
          COALESCE(allusers.num, 0)     AS "number_of_users",
          COALESCE(newusers.num, 0)     AS "number_of_new_users",
          COALESCE(terminations.num, 0) AS "number_of_terminated_users"
     FROM (SELECT DISTINCT computer
             FROM users_data) computers
     JOIN (SELECT (2000+i) AS yyyy
             FROM integers
            WHERE i BETWEEN 0 AND 10) timespan
LEFT JOIN (  SELECT YEAR(logondate) AS logonyear,
                   computer,
                   COUNT(DISTINCT userid) AS "num"
              FROM users_data
          GROUP BY 1, 2) allusers
       ON timespan.yyyy = allusers.logonyear AND computers.computer = allusers.computer
LEFT JOIN ( SELECT last_logon AS logonyear,
                   computer,
                   COUNT(DISTINCT userid) AS "num"
              FROM (  SELECT computer,
                             userid,
                             YEAR(MAX(logondate)) AS "last_logon"
                        FROM users_data
                    GROUP BY 1, 2) last_user_logons
           GROUP BY 1, 2) terminations
       ON timespan.yyyy = terminations.logonyear AND computers.computer = terminations.computer
LEFT JOIN ( SELECT first_logon AS logonyear,
                   computer,
                   COUNT(DISTINCT userid) AS "num"
              FROM (  SELECT computer,
                             userid,
                             YEAR(MIN(logondate)) AS "first_logon"
                        FROM users_data
                    GROUP BY 1, 2) first_user_logons
           GROUP BY 1, 2) newusers
       ON timespan.yyyy = newusers.logonyear AND computers.computer = newusers.computer;

那些不同的子查询代表:

  • 一组不同的computers
  • 我们感兴趣的timespan
    • 注意:使用integers table
    • 注意:我们排除过去(2011年,在撰写本文时),因为我们无法关闭书籍"在去年的终止,直到今年完成。
  • 按年计算的不同用户数(allusers
  • 按年计算的newusers计算机数量 (建立在计算机上用户的所有first_logon条记录之上)
  • 按年计算的terminations计算机数量 (建立在所有last_logon条记录之上)