在2个独立的数据库中查找唯一匹配项

时间:2010-08-05 18:50:28

标签: sql

我有2个数据库具有相同的结构,但数据不同。两者都是SQL 2005。

我试图找到数据库B中存在的数据库中的哪些人。我最好的匹配机会是匹配FirstName和LastName。

我只想带回一份清单:

DatabaseA.Person DatabaseB.Person

其中: 1.我想要DatabaseA中的所有记录,即使数据库B中没有匹配项也是如此。 2.我只想要DatabaseB中的记录,其中FirstName / LastName只匹配DatabaseB中的一条记录。

我已经编写了一个查询,我将其分组,但由于我需要查看比FirstName和LastName更多的数据,因此我无法在不对其进行分组的情况下将其恢复 - 这给了我很多重复。我应该使用什么样的查询?我需要使用光标吗?

这是我现在的查询,哪种工作 - 除了我在DatabaseB中获得重复项的结果以及我想知道的关于数据库B的所有内容是FirstName / LastName与一个不同的记录匹配而没有其他记录。我的目标是获得一个我认识的人员列表,这些人员是2个数据库中的同一个人,这样我就可以建立员工之间部门代码映射的字典列表。

    select 
count(DatabaseAEmployee.id) as matchcount
, DatabaseAPerson.id as DatabaseAPersonid
, DatabaseAEmployee.DeptCode DatabaseADeptCode
, DatabaseAPerson.firstname as DatabaseAfirst
, DatabaseAPerson.lastname as DatabaseAlast
, DatabaseBPerson.id as DatabaseBPersonid
, DatabaseBEmployee.DeptCode as DatabaseBDeptCode
, DatabaseBPerson.firstname as DatabaseBfirst
, DatabaseBPerson.lastname as DatabaseBlast
, DatabaseAPerson.ssn as DatabaseAssn
, DatabaseBPerson.ssn as DatabaseBssn
, DatabaseAPerson.dateofbirth as DatabaseAdob
, DatabaseBPerson.dateofbirth as DatabaseBdob

FROM [DatabaseA].[dbo].Employee DatabaseAEmployee
LEFT OUTER JOIN [DatabaseA].[dbo].Person DatabaseAPerson 
 ON DatabaseAPerson.id = DatabaseAEmployee.id
LEFT OUTER JOIN [DatabaseB].[dbo].Person DatabaseBPerson
 ON 
 DatabaseAPerson.firstname = DatabaseBPerson.firstname 
 AND
 DatabaseAPerson.lastname = DatabaseBPerson.lastname 
LEFT OUTER JOIN [DatabaseB].[dbo].Employee DatabaseBEmployee 
 on DatabaseBEmployee.id = DatabaseBPerson.id
group by 
DatabaseAPerson.firstname
, DatabaseAPerson.lastname
, DatabaseAPerson.id
, DatabaseAEmployee.DeptCode
, DatabaseBPerson.id
, DatabaseBEmployee.DeptCode
, DatabaseBPerson.firstname
, DatabaseBPerson.lastname
, DatabaseBPerson.ssn
, DatabaseAPerson.ssn
, DatabaseBPerson.dateofbirth
, DatabaseAPerson.dateofbirth

以下是我现在正在尝试的内容,但我在左侧获得了重复内容:

with UniqueMatchedPersons (Id, FirstName, LastName)
as (
select 
    p2.ID, p2.FirstName, p2.LastName
from 
    [DatabaseA].[dbo].[Employee] p1 
INNER JOIN [DatabaseA].[dbo].[Person] p2 on p1.id = p2.id
    inner join [DatabaseB].[dbo].[Person] p3
        on p2.FirstName = p3.FirstName and p2.LastName = p3.LastName
INNER JOIN  [DatabaseB].[dbo].[Employee] p4
on p3.id = p4.id

group by p2.ID, p2.FirstName, p2.LastName
having count(p2.ID) = 1

)

select p1.*, p2.*
from DatabaseA.dbo.Person p1
inner join UniqueMatchedPersons on p1.ID = UniqueMatchedPersons.ID
left outer join DatabaseB.dbo.Person p2 
    on p1.FirstName = p2.FirstName and p1.LastName = p2.LastName

4 个答案:

答案 0 :(得分:2)

试试这个:

SELECT id,FirstName,Lastname 
FROM   dba.Persons
UNION
SELECT b.id,b.FirstName,b.LastName 
FROM   dbb.Persons as b
INNER JOIN dba.Persons as a
ON b.FirstName = a.FirstName AND b.LastName = a.LastName

如果你想从A获得所有,而只有那些来自B的人没有匹配(这对我来说更有意义),我会用这个:

SELECT id,FirstName,Lastname 
FROM dba.Persons
UNION
SELECT b.id,b.FirstName,b.LastName 
FROM dbb.Persons as b
LEFT OUTER JOIN dba.Persons as a
ON b.FirstName = a.FirstName AND b.LastName = a.LastName
WHERE a.id is null

答案 1 :(得分:2)

尝试类似:

Select dta.LastName, dta.FirstName, dta.[otherColumns] dtb.LastName, dtb.FirstName
    dtb.[otherColumns]
From [databaseA].[table] as dta
LEFT OUTER JOIN [databaseB].[table] as dtb
    on dta.Lastname = dtb.LastName and dta.FirstName = dtb.FirstName

那应该得到你:1)表A中的每个人,以及2)表B中的每个人在表A中都有姓氏/名字匹配。

答案 2 :(得分:2)

在SQL Server(至少它应该)

时工作
SELECT
    A.*
    ,   B.*
FROM
    DatabaseA.dbo.Person A
    LEFT JOIN DatabaseB.dbo.Person B 
        ON A.FirstName = B.FirstName AND A.LastName = B.LastName

修改:您提到您从DatabaseB收到重复项,您只需要在名字和姓氏上匹配。但是你也要求其他数据(然后是first / lastname)这就是问题所在。如果您使用不同的数据,则只需要该数据。

答案 3 :(得分:2)

使用transact-sql,以下未经测试的查询应该只允许您查看唯一匹配:

select 
    p1.ID, p1.FirstName, p1.LastName 
from 
    [DatabaseA].[dbo].[Persons] p1 
    left outer join [DatabaseB].[dbo].[Persons] p2 
        on p1.FirstName = p2.FirstName and p1.LastName = p2.LastName

group by p1.ID, p1.FirstName, p2.LastName

having count(p1.ID) = 1

如果使用Sql Server,则可以将其封装在公共表表达式中,您可以对其执行连接。

with UniqueMatchedPersons (Id, FirstName, LastName)
as (
    --query in previous code snippet 
)
select persons.*
from Persons
inner join UniqueMatchedPersons on Persons.ID = UniqueMatchedPersons.ID

<强>更新

如果您希望从两个表中选择字段,您只需重新指定之前评估名称匹配的原始连接条件;这是因为联接左侧的重复匹配已被having聚合条件过滤掉。

修改上述代码段的select部分以阅读以下内容,您可以从联接的任一侧选择字段:

select p1.*, p2.*
from [DatabaseA].[dbo].[Persons] p1 
inner join UniqueMatchedPersons on p1.ID = UniqueMatchedPersons.ID
left outer join [DatabaseB].[dbo].[Persons] p2 
    on p1.FirstName = p2.FirstName and p1.LastName = p2.LastName

更新2:

要过滤掉左侧的重复项(这也会导致右侧重复),您必须删除[DatabaseA].[dbo].[Persons].[ID]上的分组。

当我提到重复项时,我指的是在字符和填充方面相邻行中相同的名称。如果您有名字和姓氏的变音变体,那么名称比较的结果将受数据库排序规则的约束(除非您明确声明对连接表达式的排序规则)。同样,如果名称之间的间距,填充或标点符号有变化,则可能需要考虑与直接相等运算符不同的方法进行名称匹配。

尝试以下方法:

with UniqueMatchedPersons (FirstName, LastName)
as (
select 
    p1.FirstName, p1.LastName
from 
  [DatabaseA].[dbo].[Person] p1
  left outer join [DatabaseB].[dbo].[Person] p2
        on p2.FirstName = p3.FirstName and p2.LastName = p3.LastName

group by p1.FirstName, p1.LastName
having count(p1.FirstName) = 1
)

select p1.*, p2.*, e1.*, e2.*
from [DatabaseA].[dbo].[Person] p1
inner join UniqueMatchedPersons ump 
      on p1.FirstName = ump.FirstName and p1.LastName = ump.LastName
left outer join [DatabaseB].[dbo].[Person] p2 
      on p1.FirstName = p2.FirstName and p1.LastName = p2.LastName
inner join [DatabaseA].[dbo].[Employee] e1 on p1.ID = e1.ID
inner join [DatabaseB].[dbo].[Employee] e2 on e2.ID = p2.ID

order by p1.id asc