从一个表中获取另一个表中不存在的记录

时间:2012-06-20 18:31:26

标签: php mysql group-by unix-timestamp

我知道标题可能听起来已经有几十个相似的问题,但我认为这个问题有点不同。不过,如果我的问题已经存在类似的问题,请指出。

基本上,我有两个表:usersresumes。以下是他们的架构的片段:

users:
    id  signup_time
resumes:
    id  user_id  modified_time

现在,我需要在用户指定的时间范围内(所有日期都是UNIX时间戳)获取没有恢复的所有用户的总数,按天,周或月分组 - 按他们没有的日期一般来说,上传的简历。这最让我困扰的是,因为如果不是分组,查询可能看起来像:

SELECT u.id FROM `jb_users` u WHERE
    u.id NOT IN (
        SELECT r.user_id FROM `jb_resumes` r
        WHERE (r.modified_time BETWEEN 1330581600 AND 1335848399)
    ) AND u.signup_time >= 1330581600

例如,让我们考虑一些例子。希望这样理解起来会更容易。

假设我们有数据:

users
    id  signup_time
    ---------------
    1   1340214369 (20.06.2012)
    2   1330754400 (03.03.2012)
    3   1329285600 (15.02.2012)
    4   1324447200 (21.12.2011)
resumes
    id  user_id  modified_time
    --------------------------
    1   1        1340214369 (20.06.2012)
    2   2        1330840800 (04.03.2012)
    3   2        1340214369 (20.06.2012)
    4   3        1334506920 (15.04.2012)
    5   3        1334638800 (17.04.2012)
    6   2        1334638800 (17.04.2012)
    7   3        1336798800 (12.05.2012)

对于时间范围01.03.2012 00:00:00 - 30.04.2012 23:59:59(按月分组)它应该返回:

count   user_ids    time
2       3,4         1330840800 (03.2012 - can be any date in the month, in fact)
1       4           1334506920 (04.2012 - can be any date in the month, in fact)

对于相同的时间范围,但每天分组,应返回:

count   user_ids    time
2       3,4         1330840800 (04.03.2012)
2       2,4         1334506920 (15.04.2012)
1       4           1334638800 (17.04.2012)

我希望这个问题足够明确。如果没有,请告诉我。

数据将使用PHP进行处理,因此如果使用单个查询无法实现(即使使用子查询),也可以使用PHP处理数据。

谢谢。

3 个答案:

答案 0 :(得分:1)

以下是我提出的按月分组的解决方案。我在本地MySQL安装中使用了您的数据来测试结果:

SELECT 
    COUNT(*) AS cnt,
    GROUP_CONCAT(b.id ORDER BY b.id) AS user_ids,
    a.monthgroup

FROM 
(
    SELECT MONTH(FROM_UNIXTIME(modified_time)) AS monthgroup
    FROM jb_resumes
    WHERE modified_time BETWEEN 
        UNIX_TIMESTAMP('2012-03-01 00:00:00') 
        AND UNIX_TIMESTAMP('2012-04-30 23:59:59')
    GROUP BY monthgroup
) a
CROSS JOIN 
    jb_users b
LEFT JOIN
    jb_resumes c ON 
        b.id = c.user_id 
        AND a.monthgroup = MONTH(FROM_UNIXTIME(modified_time))
WHERE
    b.signup_time < UNIX_TIMESTAMP('2012-04-30 23:59:59')
    AND c.user_id IS NULL
GROUP BY
    a.monthgroup
ORDER BY
    a.monthgroup

Result Set

它有点笨重,所以我要看看能不能提出更优雅的解决方案。

日间分组的解决方案:

SELECT 
    COUNT(*) AS cnt,
    GROUP_CONCAT(b.id ORDER BY b.id) AS user_ids,
    a.daygroup

FROM 
(
    SELECT MAKEDATE(YEAR(FROM_UNIXTIME(modified_time)), DAYOFYEAR(FROM_UNIXTIME(modified_time))) AS daygroup
    FROM jb_resumes
    WHERE modified_time BETWEEN 
        UNIX_TIMESTAMP('2012-03-01 00:00:00') 
        AND UNIX_TIMESTAMP('2012-04-30 23:59:59')
    GROUP BY daygroup
) a
CROSS JOIN 
    jb_users b
LEFT JOIN
    jb_resumes c ON
        b.id = c.user_id
        AND a.daygroup = MAKEDATE(YEAR(FROM_UNIXTIME(modified_time)), DAYOFYEAR(FROM_UNIXTIME(modified_time)))
WHERE
    b.signup_time < UNIX_TIMESTAMP('2012-04-30 23:59:59')
    AND c.user_id IS NULL
GROUP BY
    a.daygroup
ORDER BY
    a.daygroup

编辑:月份分组查询说明

由于您要求对解决方案进行解释,以下是我如何理解的:

我们必须首先在一段时间内从所有modified_time中提取月份分组:

SELECT MONTH(FROM_UNIXTIME(modified_time)) AS monthgroup
FROM jb_resumes
WHERE modified_time BETWEEN 
    UNIX_TIMESTAMP('2012-03-01 00:00:00') 
    AND UNIX_TIMESTAMP('2012-04-30 23:59:59')
GROUP BY monthgroup

导致:

Step 1

然后,为了比较每个monthgroup和每个用户的组合来确定哪些用户在monthgroup内没有修改时间,我们必须在{{1}之间制作笛卡尔积和所有用户。由于上面的查询已经使用monthgroup,我们无法直接加入该查询,而是必须将其包含在子选择中以进入GROUP BY子句:

FROM

导致:

Step 2

现在我们有SELECT a.monthgroup, b.* FROM ( SELECT MONTH(FROM_UNIXTIME(modified_time)) AS monthgroup FROM jb_resumes WHERE modified_time BETWEEN UNIX_TIMESTAMP('2012-03-01 00:00:00') AND UNIX_TIMESTAMP('2012-04-30 23:59:59') GROUP BY monthgroup ) a CROSS JOIN jb_users b -- ORDER BY a.monthgroup, b.id #for clarity's sake s和所有monthgroup的组合,但我们不希望包含时间范围之后id的用户,因此我们会对其进行过滤在我们的signup_time条款中引入第一个条件:

WHERE

导致:

Step 3

注意SELECT a.monthgroup, b.* FROM ( SELECT MONTH(FROM_UNIXTIME(modified_time)) AS monthgroup FROM jb_resumes WHERE modified_time BETWEEN UNIX_TIMESTAMP('2012-03-01 00:00:00') AND UNIX_TIMESTAMP('2012-04-30 23:59:59') GROUP BY monthgroup ) a CROSS JOIN jb_users b WHERE b.signup_time < UNIX_TIMESTAMP('2012-04-30 23:59:59') -- ORDER BY a.monthgroup, b.id #for clarity's sake id已被过滤掉。 现在我们可以通过1进行比较:

LEFT JOIN

导致:

Step 4

我们SELECT a.monthgroup, b.*, c.* FROM ( SELECT MONTH(FROM_UNIXTIME(modified_time)) AS monthgroup FROM jb_resumes WHERE modified_time BETWEEN UNIX_TIMESTAMP('2012-03-01 00:00:00') AND UNIX_TIMESTAMP('2012-04-30 23:59:59') GROUP BY monthgroup ) a CROSS JOIN jb_users b LEFT JOIN jb_resumes c ON b.id = c.user_id AND a.monthgroup = MONTH(FROM_UNIXTIME(modified_time)) WHERE b.signup_time < UNIX_TIMESTAMP('2012-04-30 23:59:59') -- ORDER BY a.monthgroup, b.id #for clarity's sake 条件是用户在LEFT JOIN 中修改了简历,表明修改发生在当月jb_resumes值。如果用户在该月没有恢复修改,则monthgroup会返回LEFT JOIN表中的值。我们想要那些条件不满足的用户,因此我们必须将我们的第二个条件放在NULL子句中:

WHERE

导致:

Step 5

最后,我们可以对SELECT a.monthgroup, b.*, c.* FROM ( SELECT MONTH(FROM_UNIXTIME(modified_time)) AS monthgroup FROM jb_resumes WHERE modified_time BETWEEN UNIX_TIMESTAMP('2012-03-01 00:00:00') AND UNIX_TIMESTAMP('2012-04-30 23:59:59') GROUP BY monthgroup ) a CROSS JOIN jb_users b LEFT JOIN jb_resumes c ON b.id = c.user_id AND a.monthgroup = MONTH(FROM_UNIXTIME(modified_time)) WHERE b.signup_time < UNIX_TIMESTAMP('2012-04-30 23:59:59') AND c.user_id IS NULL -- ORDER BY a.monthgroup, b.id #for clarity's sake 字段进行分组,并加入我们的monthgroupCOUNT()函数:

GROUP_CONCAT()

给我们预期的结果:

Result Set

答案 1 :(得分:0)

试试这个:

   SELECT count(u.id) FROM `jb_users` u WHERE
        u.id NOT IN (
            SELECT distinct r.user_id FROM `jb_resumes` r
            WHERE (r.modified_time BETWEEN 1330581600 AND 1335848399)
 ) AND u.signup_time >= 1330581600 GROUP BY FROM_UNIXTIME(u.signup_time) ORDER BY u.signup_time

FROM_UNIXTIME会将unix时间戳记返回日期格式。

它将按日期返回特定时间范围内的总用户数。您可以根据您的要求转换日期格式。

我在内部选择查询中添加了 DISTINCT 关键字,因为一个用户可以多次更新简历,所以否则你也可以获得甚至不在该日期范围之间的记录。 / p>

答案 2 :(得分:0)

不确定这是否有效,但您可以尝试使用if。

进行连接
SELECT DISTINCT
if(r.modified_time NOT BETWEEN 1330581600 AND 1335848399, u.id, null) as UID
FROM `jb_users` u 
Left Join `jb_resumes` r ON u.id = r.user_id
WHERE
u.signup_time >= 1330581600