在Group By和HAVING之后左键加入

时间:2016-03-18 11:05:23

标签: sql group-by teradata

我有一个cnst_prsn_nm视图。我想检查哪些记录共享相同的cnst_mstr_id和相同的姓氏但名字不同。所以我在Teradata SQL中做了

SELECT  TOP 20 prsn_nm_a.cnst_mstr_id  FROM arc_mdm_vws.bz_cnst_prsn_nm prsn_nm_a
INNER JOIN arc_mdm_vws.bz_cnst_prsn_nm prsn_nm_b
    ON prsn_nm_a.cnst_mstr_id = prsn_nm_b.cnst_mstr_id
GROUP BY prsn_nm_a.cnst_mstr_id
HAVING COUNT(DISTINCT prsn_nm_a.bz_cnst_prsn_last_nm) = 1
WHERE prsn_nm_a.bz_cnst_prsn_first_nm <> prsn_nm_b.bz_cnst_prsn_first_nm

然后对于那些记录&#39; cnst_mstr_ids,我想查看另一个表cnst_mstr。 基本上我想检查左连接IS NULL

的位置
LEFT JOIN arc_mdm_vws.bzal_cnst_mstr mstr_new
    ON prsn_nm_a.cnst_mstr_id = mstr_new.new_cnst_mstr_id
WHERE mstr_new.new_cnst_mstr_id IS NULL

所以我的查询基本上是

SELECT  TOP 20 prsn_nm_a.cnst_mstr_id  FROM arc_mdm_vws.bz_cnst_prsn_nm prsn_nm_a
INNER JOIN arc_mdm_vws.bz_cnst_prsn_nm prsn_nm_b
    ON prsn_nm_a.cnst_mstr_id = prsn_nm_b.cnst_mstr_id
GROUP BY prsn_nm_a.cnst_mstr_id
HAVING COUNT(DISTINCT prsn_nm_a.bz_cnst_prsn_last_nm) = 1
WHERE prsn_nm_a.bz_cnst_prsn_first_nm <> prsn_nm_b.bz_cnst_prsn_first_nm
LEFT JOIN arc_mdm_vws.bzal_cnst_mstr mstr_new
    ON prsn_nm_a.cnst_mstr_id = mstr_new.new_cnst_mstr_id
WHERE mstr_new.new_cnst_mstr_id IS NULL

但是有两个WHERE子句。在HAVING之后,LEFT JOIN也不能直接存在。如果存在与分组关联的过滤器,如何在分组依据和HAVING子句之后进行左连接?

3 个答案:

答案 0 :(得分:2)

SQL语句中的子句总是按特定顺序排列。首先是SELECT,然后是FROM,然后是JOIN s,然后是WHERE,然后是GROUP BY,然后是HAVING。您不能偏离该顺序,也不需要(也不能拥有)第二个WHERE子句。让您的唯一WHERE条款包含 所有 您需要的条件。

SELECT  TOP 20 prsn_nm_a.cnst_mstr_id  
FROM arc_mdm_vws.bz_cnst_prsn_nm prsn_nm_a
INNER JOIN arc_mdm_vws.bz_cnst_prsn_nm prsn_nm_b
    ON prsn_nm_a.cnst_mstr_id = prsn_nm_b.cnst_mstr_id
LEFT JOIN arc_mdm_vws.bzal_cnst_mstr mstr_new
    ON prsn_nm_a.cnst_mstr_id = mstr_new.new_cnst_mstr_id
WHERE prsn_nm_a.bz_cnst_prsn_first_nm <> prsn_nm_b.bz_cnst_prsn_first_nm
  AND mstr_new.new_cnst_mstr_id IS NULL
GROUP BY prsn_nm_a.cnst_mstr_id
HAVING COUNT(DISTINCT prsn_nm_a.bz_cnst_prsn_last_nm) = 1

答案 1 :(得分:1)

您的原始查询不正确(WHEREGROUP BY之前发生)让我假设您的意思是:

SELECT  TOP 20 prsn_nm_a.cnst_mstr_id
FROM arc_mdm_vws.bz_cnst_prsn_nm prsn_nm_a INNER JOIN
     arc_mdm_vws.bz_cnst_prsn_nm prsn_nm_b
     ON prsn_nm_a.cnst_mstr_id = prsn_nm_b.cnst_mstr_id
WHERE prsn_nm_a.bz_cnst_prsn_first_nm <> prsn_nm_b.bz_cnst_prsn_first_nm
GROUP BY prsn_nm_a.cnst_mstr_id
HAVING COUNT(DISTINCT prsn_nm_a.bz_cnst_prsn_last_nm) = 1;

不匹配的左连接相当于使用NOT EXISTS,因此您可以执行以下操作:

SELECT TOP 20 prsn_nm_a.cnst_mstr_id
FROM arc_mdm_vws.bz_cnst_prsn_nm prsn_nm_a INNER JOIN
     arc_mdm_vws.bz_cnst_prsn_nm prsn_nm_b
     ON prsn_nm_a.cnst_mstr_id = prsn_nm_b.cnst_mstr_id
WHERE prsn_nm_a.bz_cnst_prsn_first_nm <> prsn_nm_b.bz_cnst_prsn_first_nm
GROUP BY prsn_nm_a.cnst_mstr_id
HAVING COUNT(DISTINCT prsn_nm_a.bz_cnst_prsn_last_nm) = 1 AND
       NOT EXISTS (SELECT 1
                   FROM arc_mdm_vws.bzal_cnst_mstr mstr_new
                   WHERE prsn_nm_a.cnst_mstr_id = mstr_new.new_cnst_mstr_id
                  );

答案 2 :(得分:1)

你的任务可以这样写,没有自我加入:

SELECT *
FROM
 (
   SELECT TOP 20 -- why TOP?
      cnst_mstr_id, bz_cnst_prsn_last_nm
   FROM arc_mdm_vws.bz_cnst_prsn_nm prsn_nm_a
   GROUP BY cnst_mstr_id, bz_cnst_prsn_last_nm      -- same customer & name
   HAVING COUNT(DISTINCT bz_cnst_prsn_first_nm) > 1 -- different first_names
 ) AS prsn_nm
WHERE NOT EXISTS 
 (
   SELECT * 
   FROM arc_mdm_vws.bzal_cnst_mstr mstr_new
   WHERE prsn_nm.cnst_mstr_id = mstr_new.new_cnst_mstr_id
 )

根据现有索引,这可能比自联接更快。

正如戈登已经提到的那样,LEFT JOIN ... IS NULLNOT EXISTS相同,而在Teradata中,后者通常更有效率。