Postgresql收件箱查询

时间:2013-02-05 18:02:22

标签: sql postgresql

我有一个如下所示的Messages表:

                    Messages
+-----+------------+-------------+--------------+
|  id |  sender_id | receiver_id |  created_at  |
+-----------------------------------------------+
|  1  |      1     |      2      |   1/1/2013   |
|  2  |      1     |      2      |   1/1/2013   |
|  3  |      2     |      1      |   1/2/2013   |
|  4  |      3     |      2      |   1/2/2013   |
|  5  |      3     |      2      |   1/3/2013   |
|  6  |      5     |      4      |   1/4/2013   |
+-----------------------------------------------+

如果'thread'是给定sender_id和receiver_id之间的一组消息,我希望查询返回最近10个消息的最新10条消息其中sender_id或receiver_id是给定的身份。

给定user_id为5的预期输出:

+-----+------------+-------------+--------------+
|  id |  sender_id | receiver_id |  created_at  |
+-----------------------------------------------+
|  1  |      5     |      2      |   1/4/2013   |
|  2  |      5     |      2      |   1/4/2013   |
|  3  |      2     |      5      |   1/4/2013   |
|  4  |      3     |      5      |   1/4/2013   |
|  5  |      5     |      2      |   1/3/2013   |
|  6  |      5     |      4      |   1/3/2013   |
+-----------------------------------------------+

例如,用户5和2(上面有4个)和10个线程(上面有3个)之间的最多10个消息的限制。

我一直在尝试使用子查询进行此类查询,但未设法获得不同线程数量的第二个限制。

SELECT * FROM (SELECT DISTINCT ON (sender_id, receiver_id) messages.* 
FROM messages 
WHERE (receiver_id = 5 OR sender_id = 5) ORDER BY sender_id, receiver_id, 
created_at DESC)   
q ORDER BY created_at DESC 
LIMIT 10 OFFSET 0;

我正在考虑创建一个包含thread_id字段的新Thread表,该字段将是sender_id + receiver_id的串联,然后只是加入Messages,但我有一种偷偷摸摸的怀疑,它只能用一个表来实现。

5 个答案:

答案 0 :(得分:2)

我可以想象在一个查询中解决您的问题的最整洁的查询是以下一个:

select * from (
  select row_number() 
    over (partition by sender_id, receiver_id order by created_at desc) as rn, m.*
  from Messages m
  where (m.sender_id, m.receiver_id) in (
    select sender_id, receiver_id
    from Messages
    where sender_id = <id> or receiver_id = <id>
    group by sender_id, receiver_id
    order by max(created_at) desc
    limit 10 offset 0
  )
) res where res.rn <= 10

row_number() over (partition by sender_id, receiver_id order by created_at desc)列将包含每个线程中每条消息的行号(如果您运行单独的查询以仅查询一个线程,它将类似于记录号)。除了这个行号之外,如果它包含在10个最顶层的线程中(由(m.sender_id, m.receiver_id) in ...query...创建),你可以查询消息本身。最后,因为你只需要10个最顶层的消息,你可以将行号限制为更低或相等到10。

答案 1 :(得分:2)

我建议接受couling的回答并略微修改它,以便它使用公用表表达式提供有效的两个查询:

WITH threads (sender_id, receiver_id, latest) as (
        select sender, 
               receiver,
               max(sent) 
          from sof_messages
         where receiver = <user>
            or sender = <user>
         group by sender,
               receiver
         order by 3
         limit 10
 ), 
 messages ([messages fields listed here], rank) as (
         select m.*, 
                rank() over (partition by (sender, receiver), order by sent desc)
           from sof_messages
          WHERE (sender, receiver) in (select (sender, receiver) from threads))
 SELECT * from messages where rank <= 10;

这样做的好处是可以让规划人员在这里很好地了解何时使用索引。实质上,查询的三个部分中的每个部分都是独立计划的。

答案 2 :(得分:1)

我发布此内容以显示可以执行的操作。

我真的不建议使用它。

执行两个单独的查询会好得多:1检索10个最近的线程,1个重复读取每个线程的10个最新消息。

但是,您可以使用rank() window function实现目标,如下所示。

select * from (
      select message.*,
             rank() over (partition by message.sender, message.receiver 
                              order by sent desc )  
      from sof_messages message,
           (
            select sender, 
                   receiver,
                   max(sent) 
              from sof_messages
             where receiver = <user>
                or sender = <user>
             group by sender,
                   receiver
             order by 3
             limit 10
           ) thread
      where message.sender = thread.sender
        and message.receiver = thread.receiver
      ) message_list

where rank <= 10

有几个不同的查询将通过窗口函数实现您的目标,其中没有一个特别干净。

答案 3 :(得分:1)

由于数据重复,创建Thread表看起来不对,但视图可能会有所帮助:

CREATE VIEW threads AS 
  SELECT sender_id, receiver_id, min(created_at) AS t_date
  FROM messages
  GROUP BY sender_id,receiver_id;

如果帖子的日期是其最新消息的日期而不是最早的消息,请将min(created_at)更改为max(created_at)

然后可以使用以下命令将其连接回消息:

SELECT ... FROM messages JOIN threads USING (sender_id,receiver_id)

答案 4 :(得分:0)

我没有对此进行过测试,但看起来您忘记了子查询中的LIMIT 10,它为您提供了10个最近的主题:

SELECT
  *
FROM
  (SELECT DISTINCT ON
     (sender_id, receiver_id) messages.* 
   FROM
     messages 
   WHERE
     (receiver_id = 5 OR sender_id = 5)
   ORDER BY
     sender_id, receiver_id, created_at DESC
   LIMIT
     10)   
  q
ORDER BY
  created_at DESC 
LIMIT
  10
OFFSET
  0;

(我已经很好地打印了SQL,因此更容易分辨出发生了什么。)