我一直在尝试使用连接字段作为我的唯一标识符在SQL Server中连接三个表,但我注意到它已经返回重复的记录。
目标是将B和C加入A。
结果有48&#3932条记录。
以下是我的查询摘录:
Select
a.xxxx, a.yyyy, b.sdsd, c.dffgg
From
[table A] a
Left Join
[table B] b on a.pkey = b.pkey
Left Join
[table C] c on a.pkey = c.pkey.
表A:
PeriodRef OfficeCode OfficeDesc TaskServLineCode TaskServLineDesc ServLineDiv PartnerCode PartnerName ManagerCode ManagerName BillerCode BillerName ClientCode ClientName BusCatCode BusCatDesc GroupCode GroupDesc TaskCode TaskDesc TaskDateOpen TaskDateTerminate InvNumber InvDate LTDInv LTDFee LTDVat LTDCn LTDRec LTDPLFC YTDInv YTDFee YTDVat YTDCn YTDRec YTDPLFC PTDInv PTDFee PTDVat PTDCn PTDRec PTDPLFC CBal BalCurr Bal30 Bal60 Bal90 Bal120 Bal150 Bal180 CM Provision PM Provision CM Provision movement Start CY Provision YTD Provision movement
201710 1 LAGOS A100 e a AAA xcv rg vgg AOA iyh erd2 tggtt yue jd kdk weeer INV Invoice NULL NULL 5yj 00:00.0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
201710 1 LAGOS A100 e a AAA cbvc rfgt ghh ZZZZZ ssf 34ef etg assw kjkl kdk jdkjf INV Invoice NULL NULL 6uuj 00:00.0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
201710 1 LAGOS A100 e a AAA zcvv ffbb ddg EOK adf 23df sss asd ieel kdk dghjg;js CT07 sff 00:00.0 00:00.0 56 00:00.0 0 4837500 237500 0 5075000 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
表B和C具有相同的模式。它们之间唯一的区别就是时期。
PS:这些表没有唯一的标识符,这就是我连接某些列以获取标识符的原因。谢谢大家。
答案 0 :(得分:0)
这是一个含糊不清的问题,"你如何处理联接中的重复记录?",所以这里是一个非常通用的答案(可能就是你要找的东西) ,或者可能会让你开始):
WITH UniqueKeys AS (
SELECT DISTINCT pkey FROM [table A]
UNION
SELECT DISTINCT pkey FROM [table B]
UNION
SELECT DISTINCT pkey FROM [table C])
SELECT
u.pkey,
CASE WHEN a.pkey IS NOT NULL THEN 1 ELSE 0 END AS in_a,
CASE WHEN b.pkey IS NOT NULL THEN 1 ELSE 0 END AS in_b,
CASE WHEN c.pkey IS NOT NULL THEN 1 ELSE 0 END AS in_c
FROM
UniqueKeys u
LEFT JOIN [table A] a ON a.pkey = u.pkey
LEFT JOIN [table B] a ON b.pkey = u.pkey
LEFT JOIN [table C] a ON c.pkey = u.pkey;
当我处理可能有重复数据或缺少"的数据时,这是我的基本方法。多个表中的键:
然后,这将给出一个列表,显示表a,b或c中是否存在每个键。
我想你会想要扩展它,例如您可以添加一个约束,只有在源表中存在重复项时才会列出键?
如果确实存在重复,例如pkey" XYZ123"在表A中存在四次,那么您可能希望将基本查询更改为GROUP BY u.pkey,并为CASE语句取MAX()值?您甚至可以通过将其设为SUM()来计算实例数,但是您需要避免"将结果乘以"。
所以你的查询现在看起来像这样:
WITH UniqueKeys AS (
SELECT DISTINCT pkey FROM [table A]
UNION
SELECT DISTINCT pkey FROM [table B]
UNION
SELECT DISTINCT pkey FROM [table C])
SELECT
u.pkey,
SUM(CASE WHEN a.pkey IS NOT NULL THEN a.instances ELSE 0 END) AS in_a,
SUM(CASE WHEN b.pkey IS NOT NULL THEN b.instances ELSE 0 END) AS in_b,
SUM(CASE WHEN c.pkey IS NOT NULL THEN c.instances ELSE 0 END) AS in_c
FROM
UniqueKeys u
LEFT JOIN (SELECT COUNT(*) AS instances FROM [table A] WHERE pkey = u.pkey) a ON a.pkey = u.pkey
LEFT JOIN (SELECT COUNT(*) AS instances FROM [table B] WHERE pkey = u.pkey) b ON b.pkey = u.pkey
LEFT JOIN (SELECT COUNT(*) AS instances FROM [table C] WHERE pkey = u.pkey) c ON c.pkey = u.pkey
GROUP BY
u.pkey;
答案 1 :(得分:0)
问题不明确。看看这对你有帮助。 无论您生成标识符的方式是什么,如果它只有一个相应的键,那么它将返回重复项。
请检查3张桌子的数量。
select count(pkey) from [table A]
select count(distinct pkey) from [table A]
select count(pkey) from [table B]
select count(distinct pkey) from [table B]
select count(pkey) from [table C]
select count(distinct pkey) from [table C]
如果表b和表c值的计数和非重复计数不同,则表示密钥在表B和表C中重复。因此,当您与表A匹配时,您的连接总是返回多行。