MySQL更新连接查询以解决重复的值

时间:2015-11-24 14:43:28

标签: mysql foreign-keys relational-database junction-table

我有一个类别表,其中有一些重复的类别,如下所述,

`Categories`
+========+============+============+
| cat_id | cat_name   | item_count |
+========+============+============+
|      1 | Category 1 |         2  |
|      2 | Category 1 |         1  |
|      3 | Category 2 |         2  |
|      4 | Category 3 |         1  |
|      5 | Category 3 |         1  |
+--------+------------+------------+

这是另一个与另一个Items表相关的联结表。第一个表中的item_count是每cat_id项的总项数。

`Junction`
+========+=========+
| cat_id | item_id |
+========+=========+
|      1 |     100 |
|      1 |     101 |
|      2 |     102 |
|      3 |     103 |
|      3 |     104 |
|      4 |     105 |
|      5 |     106 |
+--------+---------+

如何将重复类别中的这些项目添加或组合成每个副本中最多item_count的项目? (例如Category 1)。

此外,如果item_count对于那些重复的cat_id相同,则会选择最大item_count的类别,并将Category 3合并到该记录中。 (例如item_count)。

  

注意:0将不会删除重复的记录   设置为+========+============+============+ | cat_id | cat_name | item_count | +========+============+============+ | 1 | Category 1 | 3 | | 2 | Category 1 | 0 | | 3 | Category 2 | 2 | | 4 | Category 3 | 0 | | 5 | Category 3 | 2 | +--------+------------+------------+ +========+=========+ | cat_id | item_id | +========+=========+ | 1 | 100 | | 1 | 101 | | 1 | 102 | | 3 | 103 | | 3 | 104 | | 5 | 105 | | 5 | 106 | +--------+---------+

以下是预期结果。

Category 1

在结果中,有两个重复Category 3cat_id。我们有2个场景,

    由于2 = item_count小于 1 = cat_id的{​​{1}} = 1
  1. item_count = 2即使其cat_id相同也会被删除 因为4 = item_count因为cat_id是重复的最大值 5
  2. 如果有任何查询可以加入和更新两个表以解决重复项,请帮助我。

4 个答案:

答案 0 :(得分:3)

这是一个SELECT。你可以弄清楚它是否适应UPDATE; - )

为简单起见,我忽略了jucntion表

SELECT z.cat_id
     , z.cat_name
     , (z.cat_id = x.cat_id) * new_count item_count
  FROM categories x 
  LEFT 
  JOIN categories y 
    ON y.cat_name = x.cat_name 
   AND (y.item_count > x.item_count OR (y.item_count = x.item_count AND y.cat_id > x.cat_id))
  LEFT
  JOIN 
     ( SELECT a.cat_id, b.*
         FROM categories a
         JOIN 
            ( SELECT cat_name, SUM(item_count) new_count, MAX(item_count) max_count FROM categories GROUP BY cat_name) b
           ON b.cat_name = a.cat_name
     ) z
    ON z.cat_name = x.cat_name
 WHERE y.cat_id IS NULL;

+--------+------------+------------+
| cat_id | cat_name   | item_count |
+--------+------------+------------+
|      1 | Category 1 |          3 |
|      2 | Category 1 |          0 |
|      3 | Category 2 |          2 |
|      4 | Category 3 |          0 |
|      5 | Category 3 |          2 |
+--------+------------+------------+

答案 1 :(得分:1)

 DELIMITER $$
 DROP PROCEDURE IF EXISTS  cursor_proc $$
 CREATE PROCEDURE cursor_proc()
 BEGIN
   DECLARE @cat_id   INT;
   DECLARE @cat_name VARCHAR(255);
   DECLARE @item_count INT;

   DECLARE @prev_cat_Name VARCHAR(255);
   DECLARE @maxItemPerCategory INT;
   DECLARE @maxItemId INT DEFAULT 0;
   DECLARE @totalItemsCount INT;
   -- this flag will be set to true when cursor reaches end of table
   DECLARE exit_loop BOOLEAN;         
   -- Declare the cursor
   DECLARE categories_cursor CURSOR FOR
     SELECT select cat_id ,cat_name ,item_count from Categories Order By cat_name, cat_id;
   -- set exit_loop flag to true if there are no more rows
   DECLARE CONTINUE HANDLER FOR NOT FOUND SET exit_loop = TRUE;
   -- open the cursor
   OPEN categories_cursor;
   -- start looping
   categories_loop: LOOP
     -- read the name from next row into the variables 
     FETCH  categories_cursor INTO @cat_id, @cat_name, @item_count ;

     -- close the cursor and exit the loop if it has.
     IF exit_loop THEN
         CLOSE categories_loop;
         LEAVE categories_loop;
     END IF;

       IF(@prev_cat_Name <> @cat_name)
        THEN 
        -- Category has changed, set the item_count of the 'best' category with the total items count
        IF(@maxItemId > 0)
        THEN
          UPDATE Categories  
            SET Categories.item_count=@totalItemsCount
           WHERE Categories.cat_id=@maxItemId;         
        END IF;

       -- Reset Values with the actual row values                        
          SET @maxItemPerCategory = @item_count;
          SET @prev_cat_Name = @cat_name;
          SET @maxItemId = @cat_id
          SET @totalItemsCount = @item_count;
       ELSE
      -- increment the total items count
          SET @totalItemsCount = @totalItemsCount + @item_count

       -- if the actual row has the maximun item counts, then it is the 'best'
           IF (@maxIntPerCategory < @item_count)
           THEN 
             SET @maxIntPerCategory = @item_count
             SET @maxItemId = @cat_id
           ELSE
         -- else, this row is not the best of its Category
              UPDATE Categories  
                 SET Categories.item_count=0
               WHERE Categories.cat_id=@cat_id; 
           END IF; 

       END IF;


   END LOOP categories_loop;
 END $$
 DELIMITER ;

答案 2 :(得分:1)

它并不漂亮,部分来自Strawberry的SELECT

UPDATE categories cat, 
    junction jun,
    (select 
    (z.cat_id = x.cat_id) * new_count c,
     x.cat_id newcatid,
     z.cat_id oldcatid
    from categories x 
      LEFT 
      JOIN categories y 
        ON y.cat_name = x.cat_name 
       AND (y.item_count > x.item_count OR (y.item_count = x.item_count AND y.cat_id > x.cat_id))
      LEFT
      JOIN 
         ( SELECT a.cat_id, b.*
             FROM categories a
             JOIN 
                ( SELECT cat_name, SUM(item_count) new_count, MAX(item_count) max_count FROM categories GROUP BY cat_name) b
               ON b.cat_name = a.cat_name
         ) z
        ON z.cat_name = x.cat_name

     WHERE
     y.cat_id IS NULL) sourceX

     SET cat.item_count = sourceX.c, jun.cat_id = sourceX.newcatid
     WHERE cat.cat_id = jun.cat_id and cat.cat_id = sourceX.oldcatid

答案 3 :(得分:0)

我认为最好一步到位地做你想做的事情:

首先,获取您需要的数据:

SELECT Max(`cat_id`), sum(`item_count`) FROM `Categories` GROUP BY `cat_name`

使用这些数据,您可以检查更新是否正确完成。

然后,通过获取数据的循环,更新:

update Categories set item_count =
    (
    Select Tot FROM (
        Select sum(`item_count`) as Tot
        FROM `Categories`
        WHERE `cat_name` = '@cat_name') as tmp1
    )
WHERE cat_id = (
    Select MaxId
    FROM (
        select max(cat_id) as MaxId
        FROM Categories
        WHERE `cat_name` = '@cat_name') as tmp2)

请注意,如果您运行此代码两次,结果将是错误的。

最后,将其他ID设为0

UPDATE Categories set item_count = 0
WHERE `cat_name` = '@cat_name'
AND cat_id <> (
    Select MaxId
    FROM (
        select max(cat_id) as MaxId
        FROM items
        WHERE `cat_name` = '@cat_name0') as tmp2)