我有一个类别表,其中有一些重复的类别,如下所述,
`Categories`
+========+============+============+
| cat_id | cat_name | item_count |
+========+============+============+
| 1 | Category 1 | 2 |
| 2 | Category 1 | 1 |
| 3 | Category 2 | 2 |
| 4 | Category 3 | 1 |
| 5 | Category 3 | 1 |
+--------+------------+------------+
这是另一个与另一个Items表相关的联结表。第一个表中的item_count
是每cat_id
项的总项数。
`Junction`
+========+=========+
| cat_id | item_id |
+========+=========+
| 1 | 100 |
| 1 | 101 |
| 2 | 102 |
| 3 | 103 |
| 3 | 104 |
| 4 | 105 |
| 5 | 106 |
+--------+---------+
如何将重复类别中的这些项目添加或组合成每个副本中最多item_count
的项目? (例如Category 1
)。
此外,如果item_count
对于那些重复的cat_id
相同,则会选择最大item_count
的类别,并将Category 3
合并到该记录中。 (例如item_count
)。
注意:
0
将不会删除重复的记录 设置为+========+============+============+ | cat_id | cat_name | item_count | +========+============+============+ | 1 | Category 1 | 3 | | 2 | Category 1 | 0 | | 3 | Category 2 | 2 | | 4 | Category 3 | 0 | | 5 | Category 3 | 2 | +--------+------------+------------+ +========+=========+ | cat_id | item_id | +========+=========+ | 1 | 100 | | 1 | 101 | | 1 | 102 | | 3 | 103 | | 3 | 104 | | 5 | 105 | | 5 | 106 | +--------+---------+
。
以下是预期结果。
Category 1
在结果中,有两个重复Category 3
和cat_id
。我们有2个场景,
2
= item_count
小于
1
= cat_id
的{{1}} = 1
。
item_count
= 2
即使其cat_id
相同也会被删除
因为4
= item_count
因为cat_id
是重复的最大值
5
。如果有任何查询可以加入和更新两个表以解决重复项,请帮助我。
答案 0 :(得分:3)
这是一个SELECT。你可以弄清楚它是否适应UPDATE; - )
为简单起见,我忽略了jucntion表
SELECT z.cat_id
, z.cat_name
, (z.cat_id = x.cat_id) * new_count item_count
FROM categories x
LEFT
JOIN categories y
ON y.cat_name = x.cat_name
AND (y.item_count > x.item_count OR (y.item_count = x.item_count AND y.cat_id > x.cat_id))
LEFT
JOIN
( SELECT a.cat_id, b.*
FROM categories a
JOIN
( SELECT cat_name, SUM(item_count) new_count, MAX(item_count) max_count FROM categories GROUP BY cat_name) b
ON b.cat_name = a.cat_name
) z
ON z.cat_name = x.cat_name
WHERE y.cat_id IS NULL;
+--------+------------+------------+
| cat_id | cat_name | item_count |
+--------+------------+------------+
| 1 | Category 1 | 3 |
| 2 | Category 1 | 0 |
| 3 | Category 2 | 2 |
| 4 | Category 3 | 0 |
| 5 | Category 3 | 2 |
+--------+------------+------------+
答案 1 :(得分:1)
DELIMITER $$
DROP PROCEDURE IF EXISTS cursor_proc $$
CREATE PROCEDURE cursor_proc()
BEGIN
DECLARE @cat_id INT;
DECLARE @cat_name VARCHAR(255);
DECLARE @item_count INT;
DECLARE @prev_cat_Name VARCHAR(255);
DECLARE @maxItemPerCategory INT;
DECLARE @maxItemId INT DEFAULT 0;
DECLARE @totalItemsCount INT;
-- this flag will be set to true when cursor reaches end of table
DECLARE exit_loop BOOLEAN;
-- Declare the cursor
DECLARE categories_cursor CURSOR FOR
SELECT select cat_id ,cat_name ,item_count from Categories Order By cat_name, cat_id;
-- set exit_loop flag to true if there are no more rows
DECLARE CONTINUE HANDLER FOR NOT FOUND SET exit_loop = TRUE;
-- open the cursor
OPEN categories_cursor;
-- start looping
categories_loop: LOOP
-- read the name from next row into the variables
FETCH categories_cursor INTO @cat_id, @cat_name, @item_count ;
-- close the cursor and exit the loop if it has.
IF exit_loop THEN
CLOSE categories_loop;
LEAVE categories_loop;
END IF;
IF(@prev_cat_Name <> @cat_name)
THEN
-- Category has changed, set the item_count of the 'best' category with the total items count
IF(@maxItemId > 0)
THEN
UPDATE Categories
SET Categories.item_count=@totalItemsCount
WHERE Categories.cat_id=@maxItemId;
END IF;
-- Reset Values with the actual row values
SET @maxItemPerCategory = @item_count;
SET @prev_cat_Name = @cat_name;
SET @maxItemId = @cat_id
SET @totalItemsCount = @item_count;
ELSE
-- increment the total items count
SET @totalItemsCount = @totalItemsCount + @item_count
-- if the actual row has the maximun item counts, then it is the 'best'
IF (@maxIntPerCategory < @item_count)
THEN
SET @maxIntPerCategory = @item_count
SET @maxItemId = @cat_id
ELSE
-- else, this row is not the best of its Category
UPDATE Categories
SET Categories.item_count=0
WHERE Categories.cat_id=@cat_id;
END IF;
END IF;
END LOOP categories_loop;
END $$
DELIMITER ;
答案 2 :(得分:1)
它并不漂亮,部分来自Strawberry的SELECT
UPDATE categories cat,
junction jun,
(select
(z.cat_id = x.cat_id) * new_count c,
x.cat_id newcatid,
z.cat_id oldcatid
from categories x
LEFT
JOIN categories y
ON y.cat_name = x.cat_name
AND (y.item_count > x.item_count OR (y.item_count = x.item_count AND y.cat_id > x.cat_id))
LEFT
JOIN
( SELECT a.cat_id, b.*
FROM categories a
JOIN
( SELECT cat_name, SUM(item_count) new_count, MAX(item_count) max_count FROM categories GROUP BY cat_name) b
ON b.cat_name = a.cat_name
) z
ON z.cat_name = x.cat_name
WHERE
y.cat_id IS NULL) sourceX
SET cat.item_count = sourceX.c, jun.cat_id = sourceX.newcatid
WHERE cat.cat_id = jun.cat_id and cat.cat_id = sourceX.oldcatid
答案 3 :(得分:0)
我认为最好一步到位地做你想做的事情:
首先,获取您需要的数据:
SELECT Max(`cat_id`), sum(`item_count`) FROM `Categories` GROUP BY `cat_name`
使用这些数据,您可以检查更新是否正确完成。
然后,通过获取数据的循环,更新:
update Categories set item_count =
(
Select Tot FROM (
Select sum(`item_count`) as Tot
FROM `Categories`
WHERE `cat_name` = '@cat_name') as tmp1
)
WHERE cat_id = (
Select MaxId
FROM (
select max(cat_id) as MaxId
FROM Categories
WHERE `cat_name` = '@cat_name') as tmp2)
请注意,如果您运行此代码两次,结果将是错误的。
最后,将其他ID设为0
UPDATE Categories set item_count = 0
WHERE `cat_name` = '@cat_name'
AND cat_id <> (
Select MaxId
FROM (
select max(cat_id) as MaxId
FROM items
WHERE `cat_name` = '@cat_name0') as tmp2)