如何通过两列组合键的排列对行进行分组

时间:2019-01-14 07:43:17

标签: sql database sqlite group-by

不确定该问题的措词是否应该清晰(这是我能提出的最好的答案),但是这里有一个示例可以使事情变得清晰。我有一个Chats视图,该视图应该总结两个人之间的对话历史。该视图包括以下几列:SenderRecipientTimestampLatestMessageUnreadMessageCount

Chats视图的列均来自表Direct_Messages,该表存储有关系统用户之间交换的各个聊天消息的详细信息。以下是其列: IDSenderRecipientBodyTimestampTimeRead(如果收件人未读取邮件,则为null)。视图的TimestampLatestMessage列具有两个参与者之间最新的直接消息的值(最新的Timestamp FWIW)。

问题的根源实际上是,在Sender视图中仅存在 RecipientChats个复合列的一个排列,即两位参与者之间的最新交流。例如,如果加里(Gary)向巴里(Barry)发送了“嗨”消息,那么巴里回答了“你好”(Hello),这两个家伙之间Chats中唯一的条目应该是Sender为“巴里”, Recipient为'Gary',Timestamp为Barry的回复时间戳,LatestMessage为'Hello',UnreadMessageCount为Recipient尚未收到的邮件数阅读。

我尝试使用GROUP BY "Sender", "Recipient" OR "Recipient", "Sender",但它只返回两列:一列由Barry,Gary分组;另一列由Barry,Gary分组。另一个由加里·巴里(Gary,Barry)分组

这是我的代码:

SELECT Sender AS Sender,
       Recipient AS Recipient,
       Timestamp AS Timestamp,
       Body AS LatestMessage,
       (SUM(CASE WHEN TimeRead IS NULL THEN 1 ELSE 0 END) ) AS UnreadMessageCount
FROM Direct_Messages
GROUP BY Sender, Recipient OR Recipient, Sender
ORDER BY Timestamp DESC

编辑:这是Direct_Messages表中的示例数据,以及Chats视图中的相应输出

来自Direct_Messages

ID          Sender  Recipient   Body    Timestamp                   TimeRead
148567984   Gary    Barry       Hi      2018-12-12 23:53:39.487     2018-12-12 23:55:45
1668701120  Barry   Gary        Hello   2018-12-12 23:54:49.326     NULL

结果Chats

Sender  Recipient   Timestamp                 LatestMessage UnreadMessageCount
Gary    Barry       2018-12-12 23:53:39.487   Hi            0
Barry   Gary        2018-12-12 23:54:49.326   Hello         1

3 个答案:

答案 0 :(得分:1)

您可以“预存储”您的数据,以使来自每个用户组合的消息始终处于同一方向。

样本,如果您的数据是:

Sender Recipient
A ---> B
B ---> A

您将其更改为:

U1     U2
B ---> A (changed)
B ---> A

像这样:

SELECT (case when Sender > Recipient then Sender else Recipient end) AS u1,
       (case when Sender > Recipient then Recipient else Sender end) AS u2,
       Timestamp AS Timestamp,
       Body AS LatestMessage,
       (SUM(CASE WHEN TimeRead IS NULL THEN 1 ELSE 0 END) ) AS UnreadMessageCount
FROM Direct_Messages_cooked
GROUP BY 
     (case when Sender > Recipient then Sender else Recipient end), 
     (case when Sender > Recipient then Recipient else Sender end) 
ORDER BY Timestamp DESC

注意:注意性能(我想这并不重要,因为您将问题标记为sqlite)

您可以使用CTE来预查询数据并获取更具可读性的查询

with Direct_Messages_coocked as
(
    select
      (case when Sender > Recipient then Sender else Recipient end) AS U1,
      (case when Sender > Recipient then Recipient else Sender end) AS U2,
      *
    from Direct_Messages
)
SELECT U1 AS U1,
       U2 AS U2,
       Timestamp AS Timestamp,
       Body AS LatestMessage,
       (SUM(CASE WHEN TimeRead IS NULL THEN 1 ELSE 0 END) ) AS UnreadMessageCount
FROM Direct_Messages_coocked
GROUP BY U1, U2
ORDER BY Timestamp DESC

答案 1 :(得分:1)

通过将MIN()MAX()与多个参数一起使用,您可以获得所需的大部分内容。这些具有多个参数的 scalar 函数在其他数据库中的作用类似于LEAST()GREATEST()

SELECT MIN(Sender, Recipient) AS u1,
       MAX(Sender, Recipient) AS u2,
       MAX(Timestamp) AS Timestamp,
       -- Body AS LatestMessage,
       (COUNT(*) - COUNT(TimeRead)) as UnreadMessageCount
FROM Direct_Messages_cooked
GROUP BY u1, u2
ORDER BY MAX(Timestamp) DESC

挑战在于获取最新方法。您可以通过条件聚合和其他JOIN来获得此功能:

SELECT MIN(dmc.Sender, dmc.Recipient) AS u1,
       MAX(dmc.Sender, dmc.Recipient) AS u2,
       MAX(dmc.Timestamp) AS Timestamp,
       MAX(CASE WHEN dmc.Timestamp = dmc2.Timestamp THEN Body END) AS LatestMessage,
       (COUNT(*) - COUNT(dmc.TimeRead)) as UnreadMessageCount
FROM Direct_Messages_cooked dmc JOIN
     (SELECT MIN(Sender, Recipient) AS u1,
             MAX(Sender, Recipient) AS u2,
             MAX(Timestamp) AS Timestamp
      FROM Direct_Messages_cooked
      GROUP BY u1, u2
     ) dmc2
     ON dmc2.u1 = MIN(dmc.Sender, dmc.Recipient) AND
        dmc2.u2 = MAX(dmc.Sender, dmc.Recipient)
GROUP BY u1, u2
ORDER BY dmc2.Timestamp DESC

答案 2 :(得分:0)

在@Gordon Linoff和@dani herrera的有见地的答案的基础上,我设法进行了调整,并提出了针对我特定问题的简洁解决方案,尽管在我最初的问题的更广泛的范围内,从我的观察来看,@ Gordon的答案似乎是,以更充分地解决该问题。这是我设法提出的:

SELECT Sender AS Sender,
       Recipient AS Recipient,
       Timestamp AS Timestamp,
       Body AS LatestMessage,
       (COUNT( * ) - COUNT(TimeRead) ) AS UnreadMessageCount
  FROM Direct_Messages
 GROUP BY (
              SELECT MAX(Sender, Recipient) 
          ),
          (
              SELECT MIN(Sender, Recipient) 
          )
 ORDER BY Timestamp DESC