MySQL:我可以进行左连接并从连接表中只拉一行吗?

时间:2012-06-21 03:47:53

标签: mysql query-optimization

我为工作编写了一个自定义帮助台,它一直运行良好......直到最近。一个查询真的放慢了速度。现在大约需要14秒!以下是相关表格:

CREATE TABLE `tickets` (
  `id` int(11) unsigned NOT NULL DEFAULT '0',
  `date_submitted` datetime DEFAULT NULL,
  `date_closed` datetime DEFAULT NULL,
  `first_name` varchar(50) DEFAULT NULL,
  `last_name` varchar(50) DEFAULT NULL,
  `email` varchar(50) DEFAULT NULL,
  `description` text,
  `agent_id` smallint(5) unsigned NOT NULL DEFAULT '1',
  `status` smallint(5) unsigned NOT NULL DEFAULT '1',
  `priority` tinyint(4) NOT NULL DEFAULT '0',
  PRIMARY KEY (`id`),
  KEY `date_closed` (`date_closed`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

CREATE TABLE `solutions` (
  `id` int(10) unsigned NOT NULL,
  `ticket_id` mediumint(8) unsigned DEFAULT NULL,
  `date` datetime DEFAULT NULL,
  `hours_spent` float DEFAULT NULL,
  `agent_id` smallint(5) unsigned DEFAULT NULL,
  `body` text,
  PRIMARY KEY (`id`),
  KEY `ticket_id` (`ticket_id`),
  KEY `date` (`date`),
  KEY `hours_spent` (`hours_spent`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

当用户提交故障单时,它会进入“故障单”表。然后,当代理人解决问题时,他们会记录他们采取的行动。每个条目都进入“解决方案”表。换句话说,门票有很多解决方案。

减慢查询的目标是从“ticket”表中提取所有字段,也从“solutions”表中提取最新条目。这是我一直在使用的查询:

SELECT tickets.*,
    (SELECT CONCAT_WS(" * ", DATE_FORMAT(solutions.date, "%c/%e/%y"), solutions.hours_spent, CONCAT_WS(": ", solutions.agent_id, solutions.body))
    FROM solutions
    WHERE solutions.ticket_id = tickets.id
    ORDER BY solutions.date DESC, solutions.id DESC
    LIMIT 1
) AS latest_solution_entry
FROM tickets
WHERE tickets.date_closed IS NULL
OR tickets.date_closed >= '2012-06-20 00:00:00'
ORDER BY tickets.id DESC

以下是“latest_solution_entry”字段的示例:

6/20/12 * 1337 * 1: I restarted the computer and that fixed the problem. Yes, I took an hour to do this.

在PHP中,我拆分了“latest_solution_entry”字段并正确格式化。

当我注意到运行查询的页面放慢了 way 时,我运行了没有子查询的查询,而且速度非常快。然后我在原始查询上运行EXPLAIN并得到了这个:

+----+--------------------+-----------+-------+---------------+-----------+---------+---------------------+-------+-----------------------------+
| id | select_type        | table     | type  | possible_keys | key       | key_len | ref                 | rows  | Extra                       |
+----+--------------------+-----------+-------+---------------+-----------+---------+---------------------+-------+-----------------------------+
|  1 | PRIMARY            | tickets   | index | date_closed   | PRIMARY   | 4       | NULL                | 35804 | Using where                 |
|  2 | DEPENDENT SUBQUERY | solutions | ref   | ticket_id     | ticket_id | 4       | helpdesk.tickets.id |     1 | Using where; Using filesort |
+----+--------------------+-----------+-------+---------------+-----------+---------+---------------------+-------+-----------------------------+

所以我正在寻找一种方法来提高我的查询效率,但仍然达到了同样的目标。有什么想法吗?

4 个答案:

答案 0 :(得分:17)

让我总结一下我的理解:你想选择每张票及其最后的解决方案。

我喜欢在这类问题中使用以下模式,因为它避免了子查询模式,因此在需要性能的地方非常好。缺点是理解起来有点棘手:

SELECT
  t.*,
  s1.*
FROM tickets t
INNER JOIN solutions s1 ON t.id = s1.ticket_id
LEFT JOIN solutions s2 ON s1.ticket_id = s2.ticket_id AND s2.id > s1.id
WHERE s2.id IS NULL;

为了更好地理解,我只写了模式的核心。

关键是:

  • solutions表的LEFT JOIN与s1.ticket_id = s2.ticket_id条件本身:它模仿GROUP BY ticket_id

  • 条件s2.id > s1.id:它是“我只想要最后一个解决方案”的SQL,它模拟MAX()。我假设在您的模型中,the last表示with the greatest id,但您可以在此处使用日期条件。请注意,s2.id < s1.id会为您提供第一个解决方案。

  • WHERE子句s2.id IS NULL:最奇怪的一个但绝对必要的......只保留你想要的记录。

试试让我知道:)

编辑1:我刚刚意识到第二点假设是过度简化了问题。这使它更有趣:p我正在尝试看看这种模式如何与您的date, id排序一起使用。

编辑2:好的,它有点扭曲,效果很好。 LEFT JOIN的条件变为:

LEFT JOIN solutions s2 ON s1.ticket_id = s2.ticket_id
  AND (s2.date > s1.date OR (s2.date = s1.date AND s2.id > s1.id))

答案 1 :(得分:1)

如果SELECT子句中有内联视图,则必须为每一行执行该选择。在这种情况下我发现它更好,在FROM子句中放入一个内联视图,而不是执行select一次。

SELECT t.*, 
       Concat_ws(" * ", Date_format(s.date, "%c/%e/%y"), s.hours_spent, 
       Concat_ws(":", s.agent_id, s.body)) 
FROM   tickets t 
       INNER JOIN (SELECT solutions.ticket_id,
                          Max(solutions.date) maxdate 
                   FROM   solutions 
                   GROUP  BY solutions.ticket_id) last_solutions 
               ON t.id = last_solutions.ticket_id
       INNER JOIN (SELECT solutions.ticket_id,
                          solutions.date,
                          Max(solutions.id) maxid 
                   FROM   solutions 
                   GROUP  BY solutions.ticket_id,
                            solutions.date) last_solution
              ON last_solutions.ticket_id = last_solution.ticket_id 
                 and last_solutions.maxDate = last_solution.Date
       INNER JOIN solutions s 
               ON last_solution.maxid = s.id
WHERE  t.date_closed IS NULL 
        OR t.date_closed >= '2012-06-20 00:00:00' 
ORDER  BY t.id DESC 

注意:您可能需要根据需要将其设为LEFT加入

答案 2 :(得分:1)

试试这个:

SELECT *
FROM (
  -- for each ticket get the most recent solution date
  SELECT ticket_id, MAX(solutions.date) as date
  FROM solutions
  GROUP BY ticket_id
) t
JOIN tickets ON t.ticket_id = tickets.id
WHERE tickets.date_closed IS NULL OR tickets.date_closed >= '2012-06-20 00:00:00'
ORDER BY tickets.id DESC

请注意,如果存在包含相同日期的2个解决方案的故障单,则结果集中将包含重复记录。您将需要另一个连接来删除这些重复项或使用绝对序列,如串行(递增主键)。

答案 3 :(得分:0)

取决于目的,我提出一个想法:

SELECT DISTINCT s1.ticket_id, t.*,  s1.*
FROM tickets t
LEFT JOIN solutions s1 ON t.id = s1.ticket_id