合并2个表忽略重复

时间:2013-03-13 15:54:04

标签: oracle merge duplicates

我正在制作一个性别相关的名字字典,所以我有一个主表可以说:

**name_dict a**
name   gender
=======================
jhon   male
jane   female
anna   female

和一个源数据表,它有“重复”,我的意思是,同名,具有不同的性别,如下所示:

**name_source b**
name      gender
=======================
cameron   male
cameron   female
anna      female
travis    male

我想将这两个表与这些条件合并

  1. 忽略anna(在合并条件下完成a.name = b.name)
  2. 忽略了卡梅伦的条目(这就是我被困的地方)
  3. 我如何创建合并以获得此结果?

    name      gender
    ----------------
    jhon      male
    jane      female
    anna      female
    travis    male
    

    我非常感谢你的帮助和建议!

    编辑---------------------------------------------- -------------------------- 所以,这就是我用灵感创造的东西

    merge into name_dictionary x using(
        select a.name,a.gender from name_source a, (select name,count(1) from name_source group by name having count(1)>1 order by count(1)) b
        where a.name=b.name
        ) y
        on (x.name=y.name)
        when not matched then
        insert (name,gender)
        values (y.name,y.gender)
    

    然后我说,让我们对我们的朋友Thomas Tschernich的消化进行测试,因为我用过:

    insert into name_dictionary
        select name,gender
        from name_source t1
        where
            (t1.name, t1.gender) not in (
                select name, gender from name_dictionary
            )
            and
            (t1.name, t1.gender) not in (
                select t2.name, t2.gender
                from name_source t2
                join name_source t3 on (t2.name = t3.name and t2.gender != t3.gender)
            );
    

    然后互相攻击并得到:

    QUERY      EXEC TIME    FINAL ROWS  PLAN DATA
    merge      2 secs        96,070         MERGE STATEMENT ALL_ROWS Cost: 253 Bytes: 46,752 Cardinality: 974 
    c-Insert    killed (31m)          ¿?            INSERT STATEMENT ALL_ROWS Cost: 24,656,135 Bytes: 1,051,700 Cardinality: 105,170 
    

    这是我使用的表格的信息:

    Table          Initial Rows            Observations 
    name_dictionary 3,097           The ones already inserted   
    name_source     101,205         The ones i  want to filter and add to the name_dictionary
    

    (无法正确格式化,希望其可读性) 无论如何,我希望你能详细说明它是对还是我错过了什么,非常感谢!!!

    ---新发现 如果我在合并中删除订单,则成本上升到298;

1 个答案:

答案 0 :(得分:1)

使用两个单独的插入可能比合并更容易。 首先,插入表a中的所有条目,如下所示:

insert into name_new select * from name_dict

然后,为您的第二个表执行条件插入,如下所示:

insert into name_new
    select *
    from name_source t1
    where
        (t1.name, t1.gender) not in (
            select name, gender from name_new
        )
        and
        (t1.name, t1.gender) not in (
            select t2.name, t2.gender
            from name_source t2
            join name_source t3 on (t2.name = t3.name and t2.gender != t3.gender)
        )

第一个where部分对anna-case进行排序,第二个部分将对两个性别的重复进行排序。