我在使用以下设置在数据库表中查找重复项时遇到了问题:
==========================================================================
| stock_id | product_id | store_id | stock_qty | updated_at |
==========================================================================
| 9990 | 51 | 1 | 13 | 2014-10-25 16:30:01 |
| 9991 | 90 | 2 | 5 | 2014-10-25 16:30:01 |
| 9992 | 161 | 1 | 3 | 2014-10-25 16:30:01 |
| 9993 | 254 | 1 | 18 | 2014-10-25 16:30:01 |
| 9994 | 284 | 2 | 12 | 2014-10-25 16:30:01 |
| 9995 | 51 | 1 | 11 | 2014-10-25 17:30:02 |
| 9996 | 90 | 2 | 5 | 2014-10-25 17:30:02 |
| 9997 | 161 | 1 | 3 | 2014-10-25 17:30:02 |
| 9998 | 254 | 1 | 16 | 2014-10-25 17:30:02 |
| 9999 | 284 | 2 | 12 | 2014-10-25 17:30:02 |
==========================================================================
我每小时都会将股票更新导入此表,我试图找到重复的股票条目(任何具有匹配的产品ID和商店ID的行),因此我可以删除最旧的。下面的查询是我的尝试,通过比较这样的连接上的产品ID和商店ID我可以找到一组重复项:
SELECT s.`stock_id`, s.`product_id`, s.`store_id`, s.`stock_qty`, s.`updated_at`
FROM `stock` s
INNER JOIN `stock` j ON s.`product_id`=j.`product_id` AND s.`store_id`=j.`store_id`
GROUP BY `stock_id`
HAVING COUNT(*) > 1
ORDER BY s.updated_at DESC, s.product_id ASC, s.store_id ASC, s.stock_id ASC;
虽然此查询可以正常运行,但它找不到所有重复项,只有1套,这意味着如果导入错误并且直到上午都没有注意到,那么我们有可能#&# 39;留下大量重复的股票条目。我的MySQL技能遗憾地缺乏,我完全不知道如何以快速,可靠的方式查找和删除所有重复项。
欢迎任何帮助或想法。谢谢
答案 0 :(得分:1)
store_id
上的自我加入,product_id
和'年龄较大'与DISTINCT
结合使用时,应该为您提供所有存在较新版本的行:
> SHOW CREATE TABLE stock;
CREATE TABLE `stock` (
`stock_id` int(11) NOT NULL,
`product_id` int(11) DEFAULT NULL,
`store_id` int(11) DEFAULT NULL,
`stock_qty` int(11) DEFAULT NULL,
`updated_at` datetime DEFAULT NULL,
PRIMARY KEY (`stock_id`)
> select * from stock;
+----------+------------+----------+-----------+---------------------+
| stock_id | product_id | store_id | stock_qty | updated_at |
+----------+------------+----------+-----------+---------------------+
| 1 | 1 | 1 | 1 | 2001-01-01 12:00:00 |
| 2 | 2 | 2 | 1 | 2001-01-01 12:00:00 |
| 3 | 2 | 2 | 1 | 2002-01-01 12:00:00 |
+----------+------------+----------+-----------+---------------------+
> SELECT DISTINCT s1.stock_id, s1.store_id, s1.product_id, s1.updated_at
FROM stock s1 JOIN stock s2
ON s1.store_id = s2.store_id
AND s1.product_id = s2.product_id
AND s1.updated_at < s2.updated_at;
+----------+----------+------------+---------------------+
| stock_id | store_id | product_id | updated_at |
+----------+----------+------------+---------------------+
| 2 | 2 | 2 | 2001-01-01 12:00:00 |
+----------+----------+------------+---------------------+
> DELETE stock FROM stock
JOIN stock s2 ON stock.store_id = s2.store_id
AND stock.product_id = s2.product_id
AND stock.updated_at < s2.updated_at;
Query OK, 1 row affected (0.02 sec)
> select * from stock;
+----------+------------+----------+-----------+---------------------+
| stock_id | product_id | store_id | stock_qty | updated_at |
+----------+------------+----------+-----------+---------------------+
| 1 | 1 | 1 | 1 | 2001-01-01 12:00:00 |
| 3 | 2 | 2 | 1 | 2002-01-01 12:00:00 |
+----------+------------+----------+-----------+---------------------+
答案 1 :(得分:1)
或者您可以使用存储过程:
DELIMITER //
DROP PROCEDURE IF EXISTS removeDuplicates;
CREATE PROCEDURE removeDuplicates(
stockID INT
)
BEGIN
DECLARE stockToKeep INT;
DECLARE storeID INT;
DECLARE productID INT;
-- gets the store and product value
SELECT DISTINCT store_id, product_id
FROM stock
WHERE stock_id = stockID
LIMIT 1
INTO
storeID, productID;
SELECT stock_id
FROM stock
WHERE product_id = productID AND store_id = storeID
ORDER BY updated_at DESC
LIMIT 1
INTO
stockToKeep;
DELETE FROM stock
WHERE product_id = productID AND store_id = storeID
AND stock_id != stockToKeep;
END //
DELIMITER ;
然后通过游标过程为每对产品ID和存储ID调用它: DELIMITER // CREATE PROCEDURE updateTable()BEGIN DECLARE完成BOOLEAN DEFAULT FALSE; DECLARE stockID INT UNSIGNED; DECLARE cur CURSOR FOR SELECT DISTINCT stock_id FROM stock; DECLARE CONTINUE HANDLER for NOT FOUND SET done:= TRUE;
OPEN cur;
testLoop: LOOP
FETCH cur INTO stockID;
IF done THEN
LEAVE testLoop;
END IF;
CALL removeDuplicates(stockID);
END LOOP testLoop;
CLOSE cur;
END//
DELIMITER ;
然后只需调用第二个程序
CALL updateTable();
答案 2 :(得分:1)
您可以使用此查询:
DELETE st FROM stock st, stock st2
WHERE st.stock_id < st2.stock_id AND st.product_id = st2.product_id AND
st.store_id = st2.store_id;
此查询将删除具有相同product_id
和store_id
的旧记录,并保留最新记录。