Question

i have a table "monthly" which contains the column "filename", "sheetname", "project", "task", "owner", "hours", "percentage"

+---------+---------+--------+----------+-------+------+------------+
|fielname |sheetname|project | task     | owner | hours|percentage  |
+---------+---------+--------+----------+-------+------+-------------+
| file1   | IBM     | Website | develop | sam   |  5   |25
| file1   | IBM     | website | test    | sam   |  7   |20
| file1   | IBM     | support | design  | ivan  |  2   |7                
| file1   | DELL    | android | config  | peter |  9   |30               
| file2   | IBM     | Website | develop | sam   |  9   |45               
| file2   | DELL    | android | config  | josef |  4   |50                
| file2   | DELL    | android | config  | peter |  3   |70               
| file2   | DELL    | android | test    | mark  |  8   |70                
| file2   | HP      | webapp  | code    | jack  |  10  |65 
| file3   | IBM     | website | test    | sam   |  7   |20
| file3   | HP      | webapp  | code    | jack  |  10  |65
| file4   | IBM     | Website | develop | sam   |  9   |45

i want to remove the duplicated rows when the sheetname and project and task and owner and hours and percentage is the same between 2 rows just the filename is different so we remove the second row and we keep the first row.

example :

| file1   | IBM     | Website | develop | sam   |  5   |25
| file2   | IBM     | Website | develop | sam   |  9   |45 
| file4   | IBM     | Website | develop | sam   |  9   |45

fil1 and file2 have different values in hours and percentage so we keep it. file2 and file4 has the same values in the other column so we remove the entire row where is file4

thank you for your help

Answer 1

Here is how you would do it using tSQL but I'm sure it will be very similar to other permutations of SQL:

Sample data:

    select * from t_shipment shipment
    join t_Pilot pilot on pilot.f_PilotID=shipment.f_Pilot_ID
    where pilot.f_ProviderID='12' and shipment.f_ShipmentType=2
    and shipment.f_date > DATEADD(yy, DATEDIFF(yy,0,getdate()), 0)

Show sample data:

IF OBJECT_ID('tempdb..#temp') IS NOT NULL
       DROP TABLE #temp;

CREATE TABLE #temp
                  (
             fielname VARCHAR(20), sheetname VARCHAR(20), project VARCHAR(20), task VARCHAR(20), owner VARCHAR(20), hours VARCHAR(20), percentage VARCHAR(20)
                  );

INSERT INTO #temp
VALUES
       ('file1', 'IBM', 'Website', 'develop', 'sam', '5', '25'
       ),
       ('file1', 'IBM', 'website', 'test', 'sam', '7', '20'
       ),
       ('file1', 'IBM', 'support', 'design', 'ivan', '2', '7'
       ),
       ('file1', 'DELL', 'android', 'config', 'peter', '9', '30'
       ),
       ('file2', 'IBM', 'Website', 'develop', 'sam', '9', '45'
       ),
       ('file2', 'DELL', 'android', 'config', 'josef', '4', '50'
       ),
       ('file2', 'DELL', 'android', 'config', 'peter', '3', '70'
       ),
       ('file2', 'DELL', 'android', 'test', 'mark', '8', '70'
       ),
       ('file2', 'HP', 'webapp', 'code', 'jack', '10', '65'
       ),
       ('file3', 'IBM', 'website', 'test', 'sam', '7', '20'
       ),
       ('file3', 'HP', 'webapp', 'code', 'jack', '10', '65'
       ),
       ('file4', 'IBM', 'Website', 'develop', 'sam', '9', '45'
       );

Removing duplicates using Common Table Expression and Windowing function with SELECT * FROM #temp assuming we do not use filename field in the ROW_NUMBER() windowing function

PARTITION BY

Data set without duplicates

;WITH CTE AS (
      SELECT     #temp.fielname, 
             #temp.sheetname, 
             #temp.project, 
             #temp.task, 
             #temp.owner, 
             #temp.hours, 
             #temp.percentage , 
             ROW_NUMBER() OVER (PARTITION BY #temp.sheetname, 
                                       #temp.project, 
                                       #temp.task, 
                                       #temp.owner, 
                                       #temp.hours, 
                                       #temp.percentage 
                            ORDER BY       #temp.fielname, 
                                       #temp.sheetname, 
                                       #temp.project, 
                                       #temp.task, 
                                       #temp.owner, 
                                       #temp.hours, 
                                       #temp.percentage)  AS rn
      FROM #temp)

      DELETE FROM CTE WHERE rn>1

SQL remove duplicate rows in the same table based on multiple columns

1 个答案: