Question

我正在尝试整理一些混合的遗留数据，虽然应该有category_id，item_number和price的单一组合 - 但通常会有一个重复的分组，它具有前一组价格。

是否有推荐的方法来查询此表，以消除与最低价格相关的记录？此外，这是平坦的CSV数据，因此如果更新更容易，则可以删除自动ID。

此外，此表中还有许多字段需要保留，但对于任一版本的数据都是通用的。

+------+-------------+-------------+-------+
|  id  | category_id | item_number | price |
+------+-------------+-------------+-------+
| 2971 |       45567 |     5904180 | 2.76  |
| 2977 |       45567 |     5906201 | 2.76  |
| 2980 |       45567 |     5909486 | 2.76  |
| 2981 |       45567 |     5909494 | 2.76  |
| 2982 |       45567 |     5901111 | 2.76  |
| 2983 |       45567 |     5901137 | 2.76  |
| 2984 |       45567 |     5901152 | 2.76  |
| 2987 |       45567 |     5904180 | 8.07  |
| 2993 |       45567 |     5906201 | 8.07  |
| 2996 |       45567 |     5909486 | 8.07  |
| 2997 |       45567 |     5909494 | 8.07  |
| 2998 |       45567 |     5901111 | 8.07  |
| 2999 |       45567 |     5901137 | 8.07  |
| 3000 |       45567 |     5901152 | 8.07  |
+------+-------------+-------------+-------+

Answer 1

如果您要删除除最高价之外的所有价格，您可以执行以下操作：

delete tt
    from thistable tt join
         (select tt.category_id, tt.item_number, max(price) as maxprice
          from thistable tt
          group by tt.category_id, tt.item_number
         ) ci
         on tt.category_id = ci.category_id and tt.item_number = ci.item_number and
            tt.price < ci.maxprice;

如果您真的只想保留最高ID而不是最高价格，那么使用id而不是price（价格是否会降低？）。

如何过滤掉因字段而异的类似记录组

1 个答案: