使用子字符串tsql擦除重复项

时间:2018-06-04 15:32:55

标签: sql sql-server tsql

我有下表

Child

我们假设ParentParent是为我们提供特定邮箱信息的条目,Parentcreate table bufScan ( msgid int, conversationid nvarchar(max), mailbox nvarchar(max) ); INSERT dbo.bufScan VALUES (1,'person1@company.com','AAQkAGIwZjk4OTk4LTRkZGYtNDM5Yi04NGZlLTAzMDY1MjQ3ZjVlMgAQAPX8hFCq30h3robsMxenwt8='), (2,'person1@company.com','AAQkAGIwZjk4OTk4LTRkZGYtNDM5Yi04NGZlLTAzMDY1MjQ3ZjVlMgAQAESPK731aUirpd0CyOIlR5I='), (3,'person1@company.com','AAQkAGIwZjk4OTk4LTRkZGYtNDM5Yi04NGZlLTAzMDY1MjQ3ZjVlMgAQAESPK731aUirpd0CyOIlR5I='), (4,'person1@company.com','AAQkAGIwZjk4OTk4LTRkZGYtNDM5Yi04NGZlLTAzMDY1MjQ3ZjVlMgAQAPX8hFCq30h3robsMxenwt8='), (5,'person2@company.com','AAQkADhlYTk5MGY1LTJkOTUtNDVjNy1iNDg0LTljYjc5ODAzZTM3OQAQAPX8hFCq30h3robsMxenwt8='), (6,'person2@company.com','AAQkADhlYTk5MGY1LTJkOTUtNDVjNy1iNDg0LTljYjc5ODAzZTM3OQAQAESPK731aUirpd0CyOIlR5I='), (7,'person2@company.com','AAQkADhlYTk5MGY1LTJkOTUtNDVjNy1iNDg0LTljYjc5ODAzZTM3OQAQAESPK731aUirpd0CyOIlR5I='), (8,'person2@company.com','AAQkADhlYTk5MGY1LTJkOTUtNDVjNy1iNDg0LTljYjc5ODAzZTM3OQAQAX8hFCq30h3robsMxenwt8='); 用于特定对话。

在此示例中,person1正在与person2进行对话。 我想只从一个邮箱保留所有对话,否则,就像在我的数据中进行双重对话一样。这意味着我希望从AAQkADhlYTk5MGY1LTJkOTUtNDVjNy1iNDg0LTljYjc5ODAzZTM3OQAAQkAGIwZjk4OTk4LTRkZGYtNDM5Yi04NGZlLTAzMDY1MjQ3ZjVlMg保留AX8hFCq30h3robsMxenwt8=AESPK731aUirpd0CyOIlR5I=的条目。我希望我更具体。 由于我还要保留有关会话中所有消息的信息,因此我不想删除邮箱同一分区中具有不同消息ID的重复项。

2 个答案:

答案 0 :(得分:1)

您可以使用dense_rank()功能:

select top (1) with ties *
from table t
order by dense_rank() over (partition by substring(mailbox, 4, len(mailbox)) 
                                          order by conversationid);

答案 1 :(得分:0)

你的解释仍然有点模糊。我认为这会奏效:

使用此样本数据:

USE tempdb
GO

create table dbo.bufScan
(
 msgid int,
 conversationid nvarchar(max),
 mailbox nvarchar(max)
 );

INSERT dbo.bufScan VALUES 
(1,'mailbox1','xxxAQA111'),
(2,'mailbox1','xxxAQA111'),
(3,'mailbox1','xxxAQA222'),
(4,'mailbox1','xxxAQA222'),
(5,'mailbox2','yyyAQA111'),
(6,'mailbox2','yyyAQA111'),
(7,'mailbox2','yyyAQA222'),
(8,'mailbox2','yyyAQA222');

你可以这样做:

WITH uniquify AS
(
  SELECT *, rn = ROW_NUMBER() OVER (PARTITION BY f.conv ORDER BY (SELECT NULL))
  FROM dbo.bufScan
  CROSS APPLY 
    (VALUES (SUBSTRING(mailbox, PATINDEX('%[0-9]%', mailbox), LEN(mailbox)))) f(conv)
)
SELECT msgid, conversationid, mailbox
FROM uniquify
WHERE rn <= 2;

返回:

msgid       conversationid   mailbox
----------- ---------------- -----------
1           mailbox1         xxxAQA111
2           mailbox1         xxxAQA111
7           mailbox2         yyyAQA222
8           mailbox2         yyyAQA222

您可以将rn&lt; = 2更改为rn = 1以返回:

msgid       conversationid   mailbox
----------- ---------------- -----------
1           mailbox1         xxxAQA111
7           mailbox2         yyyAQA222