Tricky MS Access SQL查询删除多余的重复记录

时间:2010-10-06 15:32:13

标签: sql performance ms-access

我有一个表格的Access表(我正在简化它)

ID            AutoNumber       Primary Key
SchemeName    Text (50)
SchemeNumber  Text (15)

这包含一些数据,例如......

ID            SchemeName           SchemeNumber
--------------------------------------------------------------------
714           Malcolm              ABC123
80            Malcolm              ABC123
96            Malcolms Scheme      ABC123
101           Malcolms Scheme      ABC123
98            Malcolms Scheme      DEF888
654           Another Scheme       BAR876
543           Whatever Scheme      KJL111
etc...

现在。我想在同一个SchemeNumber下删除重复的名称。但我想留下具有该计划编号最长的SchemeName的记录。 如果存在长度相同的重复记录,那么我只想留下一个,比如最低的ID(但任何一个都会真正做到)。从上面的例子中我想删除ID 714,80和101(只留下96)。

我认为这相对容易实现,但它变成了一个噩梦!谢谢你的任何建议。我知道我可以以编程方式循环它,但我宁愿只有一个DELETE查询。

6 个答案:

答案 0 :(得分:2)

DELETE FROM Table t1
WHERE EXISTS (SELECT 1 from Table t2
             WHERE t1.SchemeNumber = t2.SchemeNumber
             AND Length(t2.SchemeName) > Length(t1.SchemeName)
)

依赖于您的RDBMS,您可以使用与Length不同的函数(Oracle - 长度,mysql - 长度,sql server - LEN)

答案 1 :(得分:2)

delete ShortScheme
from Scheme ShortScheme
join Scheme LongScheme
  on ShortScheme.SchemeNumber = LongScheme.SchemeNumber
  and (len(ShortScheme.SchemeName) < len(LongScheme.SchemeName) or (len(ShortScheme.SchemeName) = len(LongScheme.SchemeName) and ShortScheme.ID > LongScheme.ID))

(SQL Server风格)

现已更新,以包含指定的领带分辨率。虽然,在两个查询中执行此操作可能会获得更好的性能:首先在原始查询中删除名称较短的方案,然后返回并删除名称长度为平局的较高ID。

答案 2 :(得分:2)

我会分多步完成这项工作。一步完成的大型删除操作让我太紧张了 - 如果你犯了错误怎么办?没有sql'撤消'声明。

-- Setup the data
DROP Table foo;
DROP Table bar;
DROP Table bat;
DROP Table baz;
CREATE TABLE foo (
  id int(11) NOT NULL,
  SchemeName varchar(50),
  SchemeNumber varchar(15),
  PRIMARY KEY (id)
);

insert into foo values (714, 'Malcolm', 'ABC123' );
insert into foo values (80, 'Malcolm', 'ABC123' );
insert into foo values (96, 'Malcolms Scheme', 'ABC123' );
insert into foo values (101, 'Malcolms Scheme', 'ABC123' );
insert into foo values (98, 'Malcolms Scheme', 'DEF888' );
insert into foo values (654, 'Another Scheme ', 'BAR876' );
insert into foo values (543, 'Whatever Scheme ', 'KJL111' );

-- Find all the records that have dups, find the longest one
create table bar as
    select max(length(SchemeName)) as max_length, SchemeNumber
    from foo
    group by SchemeNumber
    having count(*) > 1;

-- Find the one we want to keep
create table bat as
    select min(a.id) as id, a.SchemeNumber
    from foo a join bar b on a.SchemeNumber = b.SchemeNumber 
       and length(a.SchemeName) = b.max_length
    group by SchemeNumber;

-- Select into this table all the rows to delete
create table baz as 
    select a.id from foo a join bat b where a.SchemeNumber = b.SchemeNumber 
      and a.id != b.id;

这将为您提供一个新表,其中只包含您要删除的行的记录。

现在检查这些并确保它们只包含您要删除的行。通过这种方式,您可以确保在执行删除操作时,您确切地知道会发生什么。它也应该很快。

然后,当您准备好时,使用此命令使用此命令删除行。

delete from foo where id in (select id from baz);

由于表格不同,这似乎更有效,但它更安全,可能与其他方式一样快。此外,您可以在任何步骤停止并确保在执行任何实际删除之前数据是您想要的。

答案 3 :(得分:2)

查看此查询是否返回您要保留的行:

SELECT r.SchemeNumber, r.SchemeName, Min(r.ID) AS MinOfID
FROM
    (SELECT
        SchemeNumber,
        SchemeName,
        Len(SchemeName) AS name_length,
        ID
    FROM tblSchemes
    ) AS r
    INNER JOIN
    (SELECT
        SchemeNumber,
        Max(Len(SchemeName)) AS name_length
    FROM tblSchemes
    GROUP BY SchemeNumber
    ) AS w
    ON
        (r.SchemeNumber = w.SchemeNumber)
        AND (r.name_length = w.name_length)
GROUP BY r.SchemeNumber, r.SchemeName
ORDER BY r.SchemeName;

如果是,请将其另存为qrySchemes2Keep。然后创建一个DELETE查询以丢弃tblSchemes中的行,这些行的q值在qrySchemes2Keep中找不到。

DELETE 
FROM tblSchemes AS s
WHERE Not Exists (SELECT * FROM qrySchemes2Keep WHERE MinOfID = s.ID);

请注意,如果您以后使用Access'查询设计器对DELETE查询进行更改,它可能会“帮助”将SQL转换为以下内容:

DELETE s.*, Exists (SELECT * FROM qrySchemes2Keep WHERE MinOfID = s.ID)
FROM tblSchemes AS s
WHERE (((Exists (SELECT * FROM qrySchemes2Keep WHERE MinOfID = s.ID))=False));

答案 4 :(得分:0)

试试这个:

   Select * From Table t
   Where Len(SchemeName) <
      (Select Max(Len(Schemename))
       From Table
       Where SchemeNumber = t.SchemeNumber )
    And Id > 
      (Select Min (Id) 
       From Table
       Where SchemeNumber = t.SchemeNumber
           And SchemeName = t.SchemeName)

或者这个:,...

   Select * From Table t
   Where Id > 
      (Select Min(Id) From Table
       Where SchemeNumber = t.SchemeNumber
         And Len(SchemeName) <
            (Select Max(Len(Schemename))
             From Table
             Where SchemeNumber = t.SchemeNumber))

如果其中任何一个选择了应删除的记录,只需将其更改为删除

即可
   Delete 
   From Table t
   Where Len(SchemeName) <
      (Select Max(Len(Schemename))
       From Table
       Where SchemeNumber = t.SchemeNumber )
    And Id > 
      (Select Min (Id) 
       From Table
       Where SchemeNumber = t.SchemeNumber
           And SchemeName = t.SchemeName)

或使用第二种结构:

 Delete From Table t Where Id > 
  (Select Min(Id) From Table
   Where SchemeNumber = t.SchemeNumber
     And Len(SchemeName) <
        (Select Max(Len(Schemename))
         From Table
         Where SchemeNumber = t.SchemeNumber))

答案 5 :(得分:0)

如果您的平台支持排名功能和公用表表达式:

with cte as (
  select row_number() 
     over (partition by SchemeNumber order by len(SchemeName) desc) as rn
  from Table)
delete from cte where rn > 1;