删除重复记录,在具有优先级的组中保留唯一

时间:2015-05-15 11:39:25

标签: sql sql-server tsql

我有一个由我无法修改的程序生成的表,并且返回的数据如下:

USER_ID     ACTIVE_STREET      STREET
----------- -----------        -----------------
1           1                  STREET1
1           0                  STREET1
1           0                  OTHER STREET
2           0                  OTHER USER STREET
2           0                  OTHER USER STREET
2           0                  OTHER USER STREET
2           1                  OTHER USER STREET

我需要按照以下规则从此表中删除记录:

  • 每位用户只有只有一条有效街道。
  • 我必须删除重复项,但只删除那些ACTIVE_STREET设置为0
  • 的重复项

所以我只想留下这些记录:

USER_ID     ACTIVE_STREET      STREET
----------- -----------        -----------------
1           1                  STREET1
1           0                  OTHER STREET
2           1                  OTHER USER STREET

我尝试过分组,但没有id列,所以我无法删除id 如何在不改变原始表结构的情况下删除这些重复项?

编辑 - 基于戈登的回答
这是非常接近的,但有一个小小的差异:

IF OBJECT_ID( 'tempdb..#MY_TMP' ) IS NOT NULL
    BEGIN
        DROP TABLE #MY_TMP;
    END;
SELECT * INTO #MY_TMP
  FROM(
      SELECT 1 AS USER_ID,
             1 AS ACTIVE_STREET,
             'STREET1' AS STREET
      UNION ALL
      SELECT 2 AS USER_ID,
             1 AS active,
             'OTHER USER STREET' AS STREET
      UNION ALL
      SELECT 1 AS USER_ID,
             0 AS active,
             'STREET1' AS STREET
      UNION ALL
      SELECT 1 AS USER_ID,
             0 AS active,
             'OTHER STREET' AS STREET
      UNION ALL
      SELECT 2 AS USER_ID,
             0 AS active,
             'OTHER USER STREET' AS STREET
     UNION ALL
      SELECT 2 AS USER_ID,
             0 AS active,
             'OTHER USER STREET 2' AS STREET ) X;


SELECT *
  FROM #MY_TMP ORDER BY USER_ID, ACTIVE_STREET desc;

SELECT * FROM (
select USER_ID, MAX(ACTIVE_STREET) AS a, STREET
from #MY_TMP
group by USER_ID, STREET ) X ORDER BY USER_ID, a desc


;with todelete as (
      select row_number() over (partition by user_id, ACTIVE_STREET
                                     order by street) as seqnum
      from #MY_TMP t
     )
delete todelete
    where seqnum > 1;

    SELECT *
  FROM #MY_TMP ORDER BY USER_ID, ACTIVE_STREET desc;

3 个答案:

答案 0 :(得分:2)

这样做你想要的吗?

select user_id, active_street, min(street) as street
from atable t
group by user_id, active_street;

它返回您指定的结果。

如果您确实要删除表格中的行,可以使用row_number()

with todelete as (
      select t.*, row_number() over (partition by user_id, active_street
                                     order by street) as seqnum
      from atable t
     )
delete todelete
    where seqnum > 1;

Here是一个演示代码的SQL小提琴。

编辑:

哎呀,我想我误解了这个逻辑。您想要删除与标志= 0的活动街道相同的所有街道。如果是,则这是查询:

delete t from my_tmp t
    where active_street = 0 and
          exists (select 1
                  from my_tmp t2
                  where t2.user_id = t.user_id and
                        t2.street = t.street and
                        t2.active_street = 1
                 );

here是这个的小提琴。

答案 1 :(得分:0)

创建临时表。使用GROUP BY

将数据移动到临时表
insert into temptable
select USER_ID, MAX(ACTIVE_STREET), STREET
from tablename
group by USER_ID, STREET

完成后,从原始表中删除并从temptable复制到它。

答案 2 :(得分:0)

也许这些变体适用于您的任务?

-- Create table with sample data
IF OBJECT_ID('tempdb..#MY_TMP') IS NOT NULL
    DROP TABLE #MY_TMP
;
SELECT * INTO #MY_TMP
FROM (
VALUES ( 1, 1, 'STREET1'           )
,      ( 1, 0, 'STREET1'           )
,      ( 1, 0, 'OTHER STREET'      )
,      ( 2, 0, 'OTHER USER STREET' )
,      ( 2, 0, 'OTHER USER STREET' )
,      ( 2, 0, 'OTHER USER STREET' )
,      ( 2, 1, 'OTHER USER STREET' )
) T([USER_ID], [ACTIVE_STREET], [STREET]);

使用临时表的变体:

1 - 用必要的结果填写表格;

2 - 截断源表;

3 - 从临时表到源表插入数据:

IF OBJECT_ID('tempdb..#ToBeInserted') IS NOT NULL
    DROP TABLE #ToBeInserted
SELECT [USER_ID]
,      [ACTIVE_STREET]
,      [STREET]
    INTO #ToBeInserted
FROM (SELECT *, RN = ROW_NUMBER() OVER (PARTITION BY [USER_ID], [STREET]
                                        ORDER BY [STREET],[ACTIVE_STREET] DESC)
      FROM #MY_TMP) AS T
      WHERE RN = 1

TRUNCATE TABLE #MY_TMP

INSERT INTO #MY_TMP ( [USER_ID], [ACTIVE_STREET], [STREET] )
SELECT [USER_ID]
,      [ACTIVE_STREET]
,      [STREET]
FROM #ToBeInserted

使用CTE的变体

WITH CTE
AS
(SELECT *, RN = ROW_NUMBER() OVER (PARTITION BY [USER_ID],[STREET] 
                                   ORDER BY [STREET],[ACTIVE_STREET] DESC) 
FROM #MY_TMP)

DELETE CTE
WHERE RN > 1;