Mysql查询,排序,分组和性能所需的建议

时间:2011-01-04 07:55:28

标签: mysql sql

我有一个'消息'表,用户可以发送和接收消息,非常直接。我想要做的是:检索DISTINCT sender_ids WHERE receiver_id是X,并以一种方式对其进行排序,其中接收者X具有未读消息的用户首先出现,接收者X已经阅读消息的用户出现在后面并且所有内容都按照排序created_at DESC。

我有什么想法可以做到这一点?注意:性能也是一个问题。

这是我正在使用的查询,但看起来排序并没有真正做到正确,也许DISTINCT搞砸了?我期待结果6,5,4,2,3 - 但我得到6,5,4,3,2

SELECT DISTINCT sender_id
FROM message m
WHERE receiver_id = 1
ORDER BY read_at, created_at DESC

以下是包含样本数据的表格:

CREATE TABLE `message` (
  `id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
  `sender_id` bigint(20) NOT NULL,
  `receiver_id` bigint(20) NOT NULL,
  `message` text,
  `read_at` datetime DEFAULT NULL,
  `created_at` datetime DEFAULT NULL,
  PRIMARY KEY (`id`),
  KEY `sender` (`sender_id`),
  KEY `receiver` (`receiver_id`),
  KEY `dates` (`receiver_id`,`read_at`,`created_at`)
) ENGINE=MyISAM AUTO_INCREMENT=13 DEFAULT CHARSET=latin1;


INSERT INTO `message` (id, sender_id, receiver_id, message, read_at, created_at)
VALUES 
  (1,2,1,NULL,'2011-01-01 01:01:01','2011-01-01 01:01:01'),
  (2,1,2,NULL,'2011-01-01 01:01:01','2011-01-01 01:01:02'),
  (3,2,1,NULL,'2011-01-01 01:01:01','2011-01-01 01:01:03'),
  (4,3,1,NULL,'2011-01-01 01:01:01','2011-01-01 01:01:04'),
  (5,3,1,NULL,'2011-01-01 01:01:01','2011-01-01 01:01:05'),
  (6,1,4,NULL,'2011-01-01 01:01:01','2011-01-01 01:01:06'),
  (7,4,1,NULL,NULL,'2011-01-01 01:01:07'),
  (8,5,1,NULL,NULL,'2011-01-01 01:01:08'),
  (9,5,1,NULL,NULL,'2011-01-01 01:01:09'),
  (10,1,6,NULL,NULL,'2011-01-01 01:01:10'),
  (11,6,1,NULL,NULL,'2011-01-01 01:01:11');

4 个答案:

答案 0 :(得分:1)

GROUP BY

怎么样?
SELECT sender_id
FROM message m
WHERE receiver_id = 1
GROUP BY sender_id
ORDER BY MAX(IFNULL(read_at,'9999-01-01')) DESC

答案 1 :(得分:0)

以下内容返回样本数据的所需结果:

SELECT sender_id
  FROM message AS m
  WHERE receiver_id=?
  GROUP BY sender_id
  ORDER BY COUNT(*)=COUNT(read_at), MAX(created_at) DESC;

如果您想在按created_at排序时使用最旧的邮件,请将MAX更改为MIN

COUNT(read_at)忽略空值,而COUNT(*)没有,所以如果有任何未读消息,则两者将不相等。如果给予接收者的消息不是太多,它的执行速度应该相当快(receiver_id上的索引会有帮助)。在决定需要更多优化之前,对查询进行概要分析。

通过一些调整,可以使Scrum Meister的聚合表达起作用。尝试使用MIN(IF(read_at IS NULL, 0, 1))代替COUNT(*)=COUNT(read_at)。我认为它不会改善执行时间,但它至少会有很小的机会(就像大部分优化一样,它取决于MySQL的内部结构)。

测试表上的EXPLAIN结果:

+----+-------------+-------+------+----------------+----------+---------+-------+------+----------------------------------------------+
| id | select_type | table | type | possible_keys  | key      | key_len | ref   | rows | Extra                                        |
+----+-------------+-------+------+----------------+----------+---------+-------+------+----------------------------------------------+
|  1 | SIMPLE      | m     | ref  | receiver,dates | receiver | 8       | const |    7 | Using where; Using temporary; Using filesort |
+----+-------------+-------+------+----------------+----------+---------+-------+------+----------------------------------------------+

删除应用于message行的聚合函数:

SELECT sender_id
  FROM ( (SELECT sender_id, 0 AS all_read, MAX(created_at) AS recent
          FROM message AS m
          WHERE receiver_id=:receiver AND read_at IS NULL
          GROUP BY sender_id)
       UNION
         (SELECT sender_id, 1 AS all_read, MAX(created_at) AS recent
          FROM message AS m
          WHERE receiver_id=:receiver AND read_at IS NOT NULL
          GROUP BY sender_id)
       ) AS t
  GROUP BY sender_id
  ORDER BY MIN(all_read), recent DESC;
看着失去理智。此查询通过使用常量值(单独的查询允许此操作)来指示列是否未读取任何发件人的消息,而不是聚合表达式。以下是此查询的output of EXPLAIN

+----+--------------+------------+-------+----------------+-------+---------+------+------+----------------------------------------------+
| id | select_type  | table      | type  | possible_keys  | key   | key_len | ref  | rows | Extra                                        |
+----+--------------+------------+-------+----------------+-------+---------+------+------+----------------------------------------------+
|  1 | PRIMARY      | <derived2> | ALL   | NULL           | NULL  | NULL    | NULL |    5 | Using temporary; Using filesort              |
|  2 | DERIVED      | m          | ref   | receiver,dates | dates | 17      |      |    4 | Using where; Using temporary; Using filesort |
|  3 | UNION        | m          | range | receiver,dates | dates | 17      | NULL |    3 | Using where; Using temporary; Using filesort |
|NULL| UNION RESULT | <union2,3> | ALL   | NULL           | NULL  | NULL    | NULL | NULL |                                              |
+----+--------------+------------+-------+----------------+-------+---------+------+------+----------------------------------------------+

答案 2 :(得分:0)

首先以我应该做的方式实现一个小表优化:

create table messages
(
    message_id bigint unsigned not null auto_increment primary key,
    sender_id begint unsigned not null,
    receiver_id bigint unsigned not null,
    read_at datetime default null,
    created_at datetime
) engine=innodb;

create table message_body
(
    message_id bigint unsigned not null,
    message varchar(32000) not null
) engine=innodb;

我使用varchar代替文本,因为当你在文本字段中有一条小信息时,你会有2个字节。 并且消息有时会少于255个字符,因此您将仅存储1个字节而不是2个字节。 观看here

因此,如果您的消息不在同一个表中,则加载一行的权重不会太大。如果您想获得A LOTS数据,那将非常有用!

我要求的查询看起来像这样:

select distinct(sender_id) 
from messages
where receiver_id = x
group by sender_id
order by read_at desc

答案 3 :(得分:0)

我真的不明白“所有内容都按created_at desc排序”部分。

如果首先显示未读消息,则无法按created_at对“所有内容”进行排序。

但是如果你想首先列出所有未读消息(按created_at排序),则列出所有读取消息(再次按created_at排序),然后以下内容将执行此操作:

SELECT *
FROM message m
WHERE receiver_id = 1
ORDER BY 
    CASE 
      WHEN read_at IS NULL THEN 0
      ELSE 1
    END ASC,
    created_at DESC;

这会产生与您预期的略有不同的顺序,但查看样本数据我认为它应该是正确的。