SQL-某些列中有多个重复项,其中一列必须是唯一的

时间:2018-09-25 14:55:08

标签: sql sqlite duplicates unique

我对SQL还是一个​​新手,非常感谢在此项目上对我正在从事的项目的帮助。我正在使用SQLite,不确定是否会有所不同!

我需要编写一个查询,如果3列相同但1列不同,则输出一行。

第2列,第3列和第4列的总和必须在另一行中重复,

但是

第1、2、3和4列的组合不得在其他任何行中重复。

示例数据库:

ROW 1 : 12345 | Test1 | Something1 | And1  (I don't want this, it's a full row duplicate with row 2)

ROW 2 : 12345 | Test1 | Something1 | And1  (I don't want this, it's a full row duplicate with row 1)

ROW 3 : 12344 | Test1 | Something1 | And3  (I don't want this, it's not a full row duplicate but col 2, 3 and 4 combined doesn't exist anywhere else in the table)

ROW 4 : 12222 | Test2 | Something1 | And2  (I want this! It's not a full row duplicate and columns 2, 3 and 4 combined exists in row 9) 

ROW 5 : 12222 | Test3 | Something1 | And3

ROW 6 : 12222 | Test3 | Something1 | And3

ROW 7 : 12224 | Test3 | Something1 | And3

ROW 8 : 12222 | Test3 | Something2 | And3

ROW 9 : 12000 | Test2 | Something1 | And2

我想要的输出是:

12222 | Test2 | Something1 | And2

12224 | Test3 | Something1 | And3

12000 | Test2 | Something1 | And2

我希望这对某人有意义。在此先感谢您的帮助。

3 个答案:

答案 0 :(得分:0)

我认为您想要not exists

select t.*
from t
where exists (select 1
              from t t2
              where t2.col2 = t.col2 and t2.col3 = t.col3 and t2.col4 = t.col4 and
                    t2.col1 <> t.col1
             );

答案 1 :(得分:0)

我们可以加入一个标识重复组的子查询,并限制使用它:

SELECT t1.*
FROM yourTable t1
INNER JOIN
(
    SELECT col2, col3, col4
    FROM yourTable
    GROUP BY col2, col3, col4
    HAVING COUNT(*) = COUNT(DISTINCT col1) AND COUNT(*) > 1
) t2
    ON t1.col2 = t2.col2 AND t1.col3 = t2.col3 AND t1.col4 = t2.col4;

Demo

答案 2 :(得分:0)

尝试一下:

select 
col1,
col2,
col3,
col4
from (
SELECT 
*,
LEAD(valid, 1, 1) OVER (PARTITION BY col2, col3, col4 ORDER BY col1) as valid_next,
LEAD(invalid, 1, 1) OVER (PARTITION BY col2, col3, col4 ORDER BY col1) as invalid_next
FROM (
SELECT
*,
ROW_NUMBER() OVER (PARTITION BY col2, col3, col4 ORDER BY col1) AS valid,
ROW_NUMBER() OVER (PARTITION BY col1, col2, col3, col4 ORDER BY col1) AS invalid
FROM tb1
) x ) y
where valid <> valid_next and invalid = invalid_next
ORDER BY col1;

这里的逻辑是创建两个列(validinvalid)以计算1)3个重复列和2)4个重复列的出现。然后使用lag跟踪更改。如果有更改,那么我们知道有重复项,否则该行对于分区的列将是唯一的。

输出表:

+--------+--------+-------------+------+
| col1   | col2   |    col3     | col4 |
+--------+--------+-------------+------+
| 12000  | Test2  | Something1  | And2 |
| 12222  | Test2  | Something1  | And2 |
| 12224  | Test3  | Something1  | And3 |
+--------+--------+-------------+------+