重建MySQL查询以保持低于MAX_JOIN_SIZE行

时间:2014-07-25 10:25:22

标签: mysql sql query-optimization left-join

由于连接的行太多,我的SQL查询失败(大多数时候)。 MySQL提供的错误是The SELECT would examine more than MAX_JOIN_SIZE rows; check your WHERE and use SET SQL_BIG_SELECTS=1 or SET MAX_JOIN_SIZE=# if the SELECT is okay。我知道我可以通过设置提到的变量SQL_BIG_SELECTS和MAX_JOIN_SIZE来避免错误,但我觉得这不是正确的方法,并且将来只会稍微推出问题,因为连接计数可能在将来增长。

事实:我有一个事件规划工具,可以为用户(=工人)分配某些任务。这些表是users(用户ID,用户名)[ID和名称],tasks(任务,任务,开始,结束)[ID,任务名称,作为时间戳开始,作为时间戳结束]和{{ 1}}(id,userid,taskid,deleted)[ID,分配给任务的用户,任务,分配仍然有效。)

确切的表定义如下:

userassignment

我需要知道,分配了哪些用户以及他们被分配到的事件的主要日期(第1天,第2天,第3天)。

我的查询如下:

CREATE TABLE users (
 userid INT NOT NULL AUTO_INCREMENT,
 username VARCHAR(250),
 PRIMARY KEY (userid)
);

CREATE TABLE tasks (
 taskid INT NOT NULL AUTO_INCREMENT,
 task VARCHAR(250),
 start INT,
 end INT,
 PRIMARY KEY (taskid),
 INDEX USING BTREE (start),
 INDEX USING BTREE (end)
);

CREATE TABLE userassignment (
 id INT NOT NULL AUTO_INCREMENT,
 userid INT,
 taskid INT,
 deleted TINYINT,
 PRIMARY KEY (id),
 INDEX USING BTREE (userid),
 INDEX USING BTREE (userid),
 UNIQUE KEY `usertasks` (  `userid` ,  `taskid` )
);

首先,我选择在三天中的一天中有任务的所有用户(数据库中的用户数大约是分配给任务的用户的六倍),然后我离开加入三天中每一天的指定用户。

那么,有没有办法重建查询以加入更少的行?我只需要知道,谁在哪一天被分配,而不是分配的数量。

我已经尝试过UNION几个查询,但这不成功。

SQL Fiddle

真实查询的解析(不在SQL小提琴中)是:

SELECT
    u.userid,
    u.username,
    COUNT(ua.id) AS count_all,
    dayone.c AS count_one,
    daytwo.c AS count_two,
    daythree.c AS count_three
FROM
    users AS u
INNER JOIN
    userassignment AS ua ON ua.userid = u.userid AND ua.deleted = 0
INNER JOIN
    tasks AS t ON ua.taskid = t.taskid

    LEFT JOIN (
        SELECT
            u.userid,
            COUNT(ua.id) AS c
        FROM
            users AS u
        INNER JOIN
            userassignment AS ua ON
            ua.userid = u.userid AND
            ua.deleted = 0
        INNER JOIN
            tasks AS t ON
            ua.taskid = t.taskid
        WHERE
            t.start > UNIX_TIMESTAMP("2014-08-01 00:00:00") AND
            t.start < UNIX_TIMESTAMP("2014-08-02 00:00:00")
        GROUP BY
            u.userid
    ) AS dayone ON dayone.userid = u.userid

    LEFT JOIN (
        SELECT
            u.userid,
            COUNT(ua.id) AS c
        FROM
            users AS u
        INNER JOIN
            userassignment AS ua ON
            ua.userid = u.userid AND
            ua.deleted = 0
        INNER JOIN
            tasks AS t ON
            ua.taskid = t.taskid
        WHERE
            t.start > UNIX_TIMESTAMP("2014-07-31 00:00:00") AND
            t.start < UNIX_TIMESTAMP("2014-08-01 00:00:00")
        GROUP BY
            u.userid
    ) AS daytwo ON daytwo.userid = u.userid

    LEFT JOIN (
        SELECT
            u.userid,
            COUNT(ua.id) AS c
        FROM
            users AS u
        INNER JOIN
            userassignment AS ua ON
            ua.userid = u.userid AND
            ua.deleted = 0
        INNER JOIN
            tasks AS t ON
            ua.taskid = t.taskid
        WHERE
            t.start > UNIX_TIMESTAMP("2014-08-02 00:00:00") AND
            t.start < UNIX_TIMESTAMP("2014-08-04 00:00:00")
        GROUP BY
            u.userid
    ) AS daythree ON daythree.userid = u.userid

WHERE
    t.start > UNIX_TIMESTAMP("2014-07-31 00:00:00") AND
    t.start < UNIX_TIMESTAMP("2014-08-04 00:00:00")
GROUP BY
    u.userid
ORDER BY
    username ASC

2 个答案:

答案 0 :(得分:2)

所以,这真的只是一种冗长的说法......

SELECT u.*
     , DATE(FROM_UNIXTIME(t.start)) dt
     , COUNT(t.taskid) total
  FROM users u
  LEFT 
  JOIN userassignment ut
    ON ut.userid = u.userid
   AND ut.deleted = 0
  LEFT
  JOIN tasks t 
    ON t.taskid = ut.taskid
 GROUP
    BY u.userid
     , DATE(FROM_UNIXTIME(t.start))

在上面的示例中,您可以将COUNT(t.taskid)更改为COUNT(当x =&#39; y&#39;那么结束时)或SUM(CASE ...

答案 1 :(得分:1)

这应返回相同的结果集:

    SELECT u.userid, u.username,
           COUNT(ua.id) AS count_all,
           SUM(case when t.start > UNIX_TIMESTAMP('2014-08-01 00:00:00') AND
                         t.start < UNIX_TIMESTAMP('2014-08-02 00:00:00')
                    then 1 else 0
                end) as count_one,
           SUM(case when t.start > UNIX_TIMESTAMP('2014-07-31 00:00:00') AND
                         t.start < UNIX_TIMESTAMP('2014-08-01 00:00:00')
                    then 1 else 0
                end) as count_two,
           SUM(case when t.start > UNIX_TIMESTAMP('2014-08-02 00:00:00') AND
                         t.start < UNIX_TIMESTAMP('2014-08-04 00:00:00')
                    then 1 else 0
                end) as count_three
    FROM users u LEFT JOIN
         userassignment ua 
         ON ua.userid = u.userid AND
            ua.deleted = 0 LEFT JOIN
         tasks t
         ON ua.taskid = t.taskid
    WHERE ua.deleted = 0 AND
          t.start > UNIX_TIMESTAMP('2014-07-31 00:00:00') AND
          t.start < UNIX_TIMESTAMP('2014-08-04 00:00:00')
    GROUP BY u.userid
    ORDER BY u.username;

你的配方有点棘手。例如,外连接将过滤掉总是删除其分配的任何用户。并且日期周期是重叠的(我不确定这是否是有意的,但它是查询的结构方式)。

也许这个更简单的查询不会超过内部限制。