在数据库表中查找连接的子集

时间:2013-03-20 11:45:55

标签: sql duplicate-removal

在我的表中,我有一些与另一个匹配的记录:

644432 738987
738987 644432
..
854313 871860
854313 874411
871860 854313
871860 874411
874411 854313
874411 871860

例如644432738987匹配,738987644432匹配(显然)。 对我来说,他们必须是相同的,我必须得到一个,只有一个(644432或738987无论如何)。

另一个示例854313871860匹配,与874411匹配(这就是为什么我有6条记录)。

我必须在决赛中只获得两项记录,我该怎么做?

对不起我的英文,谢谢你告诉我我的问题是否不清楚。

对于示例,有一个代码来填充表格,例如:

DECLARE @DataTable TABLE (ColA  INT, ColB  INT)
insert into @DataTable  values 
(644432,    738987),
(738987,    644432),
(854313,    871860),
(854313,    874411),
(871860,    854313),
(871860,    874411),
(874411,    854313),
(874411,    871860)
select * from @DataTable

5 个答案:

答案 0 :(得分:1)

假设这是一个名为DataTable的表,有两列ColA和ColB,那么你可以这样做:

select distinct Smallest,Largest from
(
  select case when ColA > ColB then ColB else ColA end as Smallest,
  case when ColA > ColB then ColA else colB end as Largest
  from DataTable
) minmax

这使用内部选择来重新排列值,以便最小值始终是第一列,最大值在第二列中。然后外部选择只会拉出不同的值集。

答案 1 :(得分:0)

试试这个

选择col1,col2 from(选择col1 + col2作为指标,col1,col2,来自table1) 按指标分组

Refer sqlfiddle here

注意:如果两个不同的行的总和相同,则无效。

答案 2 :(得分:0)

select n1,n2 from(select a.col1 col1,a.col2 col2,rownum rn from tbl a, tbl b 
where a.col1||a.col2=(b.col2||b.col1)) where mod(rn,2)<>0
union
select a.col1 col1,a.col2 col2 from tbl a left outer join tbl b on 
a.col1||a.col2=(b.col2||b.col1) where b.col1 is null

答案 3 :(得分:0)

用于查找连接集的递归查询。对于每个链,具有最小编号的项目被报告为“组长”。

该查询首先对这些对的成员进行排序,然后找到连接组件的链。如果群集有多个起点,则此方法可以正常工作。 (但它确实避免了循环)

此语法适用于Postgresql,对于microsoft,您应省略RECURSIVE关键字,对于Oracle,您应使用CONNECT BY, PRIOR。 YMMV。

DROP SCHEMA tmp CASCADE;
CREATE SCHEMA tmp ;
SET search_path=tmp;

CREATE TABLE pairs (ONE INTEGER NOT NULL, two INTEGER NOT NULL
        , PRIMARY KEY (one, two)
        );
INSERT INTO pairs(one, two) values
(644432,738987) ,(738987,644432)
,(854313,871860) ,(854313,874411) ,(871860,854313) ,(871860,874411) ,(874411,854313) ,(874411,871860)
        ;

WITH RECURSIVE rope AS (
        WITH opair AS (
                SELECT LEAST(one, two) AS one
                , GREATEST(one, two) AS two
                FROM pairs
                )
        SELECT o.one AS top
        , o.one AS one
        , o.two AS two
        FROM opair o
        WHERE NOT EXISTS ( SELECT * FROM opair x WHERE x.two=o.one)
        UNION ALL
        SELECT k.one AS top
        , p.one AS one
        , p.two AS two
        FROM opair p
        JOIN rope k ON k.two = p.one
        )
SELECT DISTINCT top
        , COUNT(*) AS N_members
FROM rope
GROUP BY top
ORDER BY top
        ;

结果:

CREATE TABLE
INSERT 0 8
  top   | n_members 
--------+-----------
 644432 |         2
 854313 |         8
(2 rows)

答案 4 :(得分:0)

好的,下面的示例将遵循一级深度的链接。这可以通过存储过程大量清理,也可以从代码创建查询,这样可以更容易地添加更多级别的链接。

-- Set up an example table
create table DataTable
(
    A int,
    B int
)
GO

insert into DataTable values(644432,738987)
insert into DataTable values(738987,644432)
insert into DataTable values(854313,871860)
insert into DataTable values(854313,874411)
insert into DataTable values(871860,854313)
insert into DataTable values(871860,874411)
insert into DataTable values(874411,854313)
insert into DataTable values(874411,871860)
GO

-- Strip out initial duplicates
select distinct A,B into Pass1
from
(
  select case when A > B then B else A end as A,
  case when A > B then A else B end as B
  from DataTable
) minmax

-- Create a copy that we will update with links between values
select * into Pass2 from Pass1 order by A

update Pass2 set B=x.NewB from
(
  select L.A as OldA,L.B as OldB, R.B as NewB
  from Pass1 L
  inner join Pass1 R on L.B = R.A
) x
where Pass2.A=x.OldA and Pass2.B=x.OldB

update Pass2 set A=x.NewA from
(
  select L.B as OldA, R.B as OldB, L.A as NewA
  from Pass1 L
  inner join Pass1 R on L.B = R.A
) x
where Pass2.A=x.OldA and Pass2.B=x.OldB

-- Dedupe any newly created duplicates
select distinct A,B
from
(
  select case when A > B then B else A end as A,
  case when A > B then A else B end as B
  from Pass2
) minmax