每组只保留200条新记录

时间:2014-09-16 15:36:53

标签: mysql sql

我有一个像列一样的表 id |名字|日期|小组..

我想要做的是删除计数超过200的每个组的所有旧记录。

例如,我有一个名为" shoes"其中有400条记录 "礼金券"有300条记录,#34;电子产品"有100条记录等等

因此,在运行SQL查询后,我想要的是每个组(鞋子,礼品卡,电子产品等)的数量小于或等于200。 要删除的记录是按日期或按ID(自动增量)标识的旧记录。 来自" shoes"的200条记录该组将被删除,这些组的年龄大于保留的组或者ID低于保留的组。

3 个答案:

答案 0 :(得分:2)

这种类型的问题在MySQL中有点不方便,因为它们没有实现像ROW_NUMBER()这样的SQL-99窗口函数。这已经a long-standing feature request,但到目前为止尚未实施。 MySQL差不多是only major RDBMS that does not have this feature

这是一个在单个SQL语句中工作的解决方案,只能选择每个组的成员大于200。它使用名为user variables的MySQL功能,它将其值作为查询进程从一行到另一行。

DELETE f FROM foo AS f
JOIN (SELECT id, IF(@g = `group`, @rn:=@rn+1, @rn:=1) AS row_number, @g:=grp
        FROM foo, (SELECT @g:=null, @rn:=0) _init
        ORDER BY `group`, date desc) AS r
ON f.id = r.id AND r.row_number > 200;

在运行此程序之前(或删除数据的任何内容!),我建议您了解它的工作原理,并使用等效的SELECT对其进行测试,以确保选择要删除的行。

我用较小的数据集测试了这个。这是我运行时没有过滤的数据:

SELECT f.id, f.`group`, r.row_number FROM foo AS f
JOIN (SELECT id, IF(@g = `group`, @rn:=@rn+1, @rn:=1) AS row_number, @g:=grp
        FROM foo, (SELECT @g:=null, @rn:=0) _init
        ORDER BY `group`, date desc) AS r
ON f.id = r.id;

+----+--------+------------+
| id | group  | row_number |
+----+--------+------------+
|  1 |      1 |          1 |
|  2 |      1 |          2 |
|  3 |      1 |          3 |
|  5 |      1 |          4 |
| 11 |      1 |          5 |
|  4 |      2 |          1 |
| 10 |      2 |          2 |
|  8 |      2 |          3 |
|  7 |      3 |          1 |
|  6 |      3 |          2 |
| 12 |      3 |          3 |
|  9 |      4 |          1 |
+----+--------+------------+

这是SELECT跳过每组的前2个:

SELECT f.id, f.`group`, r.row_number FROM foo AS f
JOIN (SELECT id, IF(@g = `group`, @rn:=@rn+1, @rn:=1) AS row_number, @g:=grp
        FROM foo, (SELECT @g:=null, @rn:=0) _init
        ORDER BY `group`, date desc) AS r
ON f.id = r.id AND r.row_number > 2;

+----+-------+------------+
| id | group | row_number |
+----+-------+------------+
|  3 |     1 |          3 |
|  5 |     1 |          4 |
| 11 |     1 |          5 |
|  8 |     2 |          3 |
| 12 |     3 |          3 |
+----+-------+------------+

答案 1 :(得分:1)

运行此psuedo-SQL

SELECT shoes.id FROM shoes ORDER BY Date DESC LIMIT 200

然后从中解析结果(数组..(1,2等) - 调用此$ IDS)

DELETE FROM shoes WHERE ID NOT IN ($IDS)

编辑:要将它作为SQL查询完成,有两种可能的方法。

<强> 1 即可。 DELETE FROM shoes WHERE ID NOT IN (SELECT shoes.id FROM shoes ORDER BY Date DESC LIMIT 200) - 是的,你可以这样做。小心。正如比尔建议的那样,首先将其作为SELECT * FROM shoes WHERE ID NOT IN (SELECT shoes.id FROM shoes ORDER BY Date DESC LIMIT 200)首先运行,以确保它选择了正确的东西[你想要删除!]

<强> 2 即可。对DECLARE了解不多,但您可以声明@IDs = SELECT shoes.id FROM shoes ORDER BY Date DESC LIMIT 200,然后DELETE FROM shoes WHERE ID NOT IN (@IDS)

两者都未经过测试。顺便说一下,您应该使用SQLFiddle来设置模拟架构信息,以便当人们来帮助他们测试他们的查询时。

答案 2 :(得分:0)

这将是一个SQL Server解决方案

Select * from (
Select *, ROW_NUMBER() OVER (Partition By [Group] order by Date) RN  
from table) t1
inner join (
Select [GROUP], COUNT(*) as Cnt
from table
group by [Group]
) a on a.[Group] = t1.[Group]
where t1.RN <= 200
and a.Cnt >= 200

编辑:

这是使用CTE

With CTE as 
(
    Select [GROUP], COUNT(*) as cnt
    from tbl
    group by [Group]
)

Select t1.* 
from (Select *, ROW_NUMBER() OVER (Partition By [Group] order by Date) RN  
      from tbl) t1
inner join CTE a on a.[Group] = t1.[Group]
where t1.RN <= 200 and 
      a.Cnt >= 200