我有一个cnst_prsn_nm视图。我想检查哪些记录共享相同的cnst_mstr_id和相同的姓氏但名字不同。所以我在Teradata SQL中做了
SELECT TOP 20 prsn_nm_a.cnst_mstr_id FROM arc_mdm_vws.bz_cnst_prsn_nm prsn_nm_a
INNER JOIN arc_mdm_vws.bz_cnst_prsn_nm prsn_nm_b
ON prsn_nm_a.cnst_mstr_id = prsn_nm_b.cnst_mstr_id
GROUP BY prsn_nm_a.cnst_mstr_id
HAVING COUNT(DISTINCT prsn_nm_a.bz_cnst_prsn_last_nm) = 1
WHERE prsn_nm_a.bz_cnst_prsn_first_nm <> prsn_nm_b.bz_cnst_prsn_first_nm
然后对于那些记录&#39; cnst_mstr_ids,我想查看另一个表cnst_mstr。 基本上我想检查左连接IS NULL
的位置LEFT JOIN arc_mdm_vws.bzal_cnst_mstr mstr_new
ON prsn_nm_a.cnst_mstr_id = mstr_new.new_cnst_mstr_id
WHERE mstr_new.new_cnst_mstr_id IS NULL
所以我的查询基本上是
SELECT TOP 20 prsn_nm_a.cnst_mstr_id FROM arc_mdm_vws.bz_cnst_prsn_nm prsn_nm_a
INNER JOIN arc_mdm_vws.bz_cnst_prsn_nm prsn_nm_b
ON prsn_nm_a.cnst_mstr_id = prsn_nm_b.cnst_mstr_id
GROUP BY prsn_nm_a.cnst_mstr_id
HAVING COUNT(DISTINCT prsn_nm_a.bz_cnst_prsn_last_nm) = 1
WHERE prsn_nm_a.bz_cnst_prsn_first_nm <> prsn_nm_b.bz_cnst_prsn_first_nm
LEFT JOIN arc_mdm_vws.bzal_cnst_mstr mstr_new
ON prsn_nm_a.cnst_mstr_id = mstr_new.new_cnst_mstr_id
WHERE mstr_new.new_cnst_mstr_id IS NULL
但是有两个WHERE子句。在HAVING之后,LEFT JOIN也不能直接存在。如果存在与分组关联的过滤器,如何在分组依据和HAVING子句之后进行左连接?
答案 0 :(得分:2)
SQL语句中的子句总是按特定顺序排列。首先是SELECT
,然后是FROM
,然后是JOIN
s,然后是WHERE
,然后是GROUP BY
,然后是HAVING
。您不能偏离该顺序,也不需要(也不能拥有)第二个WHERE
子句。让您的唯一WHERE
条款包含 所有 您需要的条件。
SELECT TOP 20 prsn_nm_a.cnst_mstr_id
FROM arc_mdm_vws.bz_cnst_prsn_nm prsn_nm_a
INNER JOIN arc_mdm_vws.bz_cnst_prsn_nm prsn_nm_b
ON prsn_nm_a.cnst_mstr_id = prsn_nm_b.cnst_mstr_id
LEFT JOIN arc_mdm_vws.bzal_cnst_mstr mstr_new
ON prsn_nm_a.cnst_mstr_id = mstr_new.new_cnst_mstr_id
WHERE prsn_nm_a.bz_cnst_prsn_first_nm <> prsn_nm_b.bz_cnst_prsn_first_nm
AND mstr_new.new_cnst_mstr_id IS NULL
GROUP BY prsn_nm_a.cnst_mstr_id
HAVING COUNT(DISTINCT prsn_nm_a.bz_cnst_prsn_last_nm) = 1
答案 1 :(得分:1)
您的原始查询不正确(WHERE
在GROUP BY
之前发生)让我假设您的意思是:
SELECT TOP 20 prsn_nm_a.cnst_mstr_id
FROM arc_mdm_vws.bz_cnst_prsn_nm prsn_nm_a INNER JOIN
arc_mdm_vws.bz_cnst_prsn_nm prsn_nm_b
ON prsn_nm_a.cnst_mstr_id = prsn_nm_b.cnst_mstr_id
WHERE prsn_nm_a.bz_cnst_prsn_first_nm <> prsn_nm_b.bz_cnst_prsn_first_nm
GROUP BY prsn_nm_a.cnst_mstr_id
HAVING COUNT(DISTINCT prsn_nm_a.bz_cnst_prsn_last_nm) = 1;
不匹配的左连接相当于使用NOT EXISTS
,因此您可以执行以下操作:
SELECT TOP 20 prsn_nm_a.cnst_mstr_id
FROM arc_mdm_vws.bz_cnst_prsn_nm prsn_nm_a INNER JOIN
arc_mdm_vws.bz_cnst_prsn_nm prsn_nm_b
ON prsn_nm_a.cnst_mstr_id = prsn_nm_b.cnst_mstr_id
WHERE prsn_nm_a.bz_cnst_prsn_first_nm <> prsn_nm_b.bz_cnst_prsn_first_nm
GROUP BY prsn_nm_a.cnst_mstr_id
HAVING COUNT(DISTINCT prsn_nm_a.bz_cnst_prsn_last_nm) = 1 AND
NOT EXISTS (SELECT 1
FROM arc_mdm_vws.bzal_cnst_mstr mstr_new
WHERE prsn_nm_a.cnst_mstr_id = mstr_new.new_cnst_mstr_id
);
答案 2 :(得分:1)
你的任务可以这样写,没有自我加入:
SELECT *
FROM
(
SELECT TOP 20 -- why TOP?
cnst_mstr_id, bz_cnst_prsn_last_nm
FROM arc_mdm_vws.bz_cnst_prsn_nm prsn_nm_a
GROUP BY cnst_mstr_id, bz_cnst_prsn_last_nm -- same customer & name
HAVING COUNT(DISTINCT bz_cnst_prsn_first_nm) > 1 -- different first_names
) AS prsn_nm
WHERE NOT EXISTS
(
SELECT *
FROM arc_mdm_vws.bzal_cnst_mstr mstr_new
WHERE prsn_nm.cnst_mstr_id = mstr_new.new_cnst_mstr_id
)
根据现有索引,这可能比自联接更快。
正如戈登已经提到的那样,LEFT JOIN ... IS NULL
与NOT EXISTS
相同,而在Teradata中,后者通常更有效率。