Question

所以我有一个中等大小的SQLite，行数约为1030万。我有一些我要删除的重复行：

列名是：

关键字
等级
URL

我想删除的重复内容是关键字和排名都相同，但是的网址可能不同。所以我只希望关键字/等级对的第一个实例保留在数据库中并删除所有后续匹配的行。

通过整个数据库并为所有行执行此操作的最有效方法是什么？

Answer 1

您可以尝试这样的事情：

sqlite> create table my_example (keyword, rank, url);
sqlite> insert into my_example values ('aaaa', 2, 'wwww...');
sqlite> insert into my_example values ('aaaa', 2, 'wwww2..');
sqlite> insert into my_example values ('aaaa', 3, 'www2..');
sqlite> DELETE FROM my_example
   ...> WHERE rowid not in
   ...> (SELECT MIN(rowid)
   ...> FROM my_example
   ...> GROUP BY keyword, rank);
sqlite> select * from my_example;
keyword     rank        url
----------  ----------  ----------
aaaa        2           wwww...
aaaa        3           www2..
sqlite>

Answer 2

当你说So I would only want the first instance of the keyword/rank pair to remain in the database and remove all subsequent matching rows.时，你无法保证这一点。原因是您的表没有唯一键（如id或create_date）。因此，如果再次选择，则无法保证首先输入的行将首先返回。因此，将这部分放在一边，你可以做一些这样的事情，它会在大多数时候给你first instance。

delete from tbl 
where 
rowid not in
(
select  min(rowid) 
from tbl
group by Keyword,Rank
)

See sqlfiddle example here

高效的SQL查询来删除重复的行

2 个答案: