允许多表连接中的空值/重复数据删除? T-SQL

时间:2015-08-18 22:21:43

标签: sql ssis sql-server-2012

我想知道在进行多表连接(每列,而不是每行)时,SSIS或T-SQL(SQL Server 2012)中是否有一种方法可以轻松返回非重复数据

我正在尝试对一堆数据进行非规范化/扁平化以转换为仓库,而我正在清理重复大量数据。我希望有一种汇总/汇总功能或我缺少的设计概念,可以在将多个表合并到一个目的地时帮助我。

示例

让我们举例说,我有三个表:CUSTOMERS,CUSTOMER_ADDRESSES和CUSTOMER_ACCOUNTS。他们和他们的数据看起来像这样:

  

客户

CUST_ID   NAME
1         Burton Guster
     

CUSTOMER_ADDRESSES

CUST_ID   ADDR_SEQ    ADDRESS
1         1           123 Awesome St
1         2           456 Fake St
     

CUSTOMER_ACCOUNTS

CUST_ID   ACCT_SEQ    ACCT_TYPE    ACCOUNT_OPEN_DT
1         1           TAP          1/1/1989
1         2           PHARMA       1/1/2010

我使用这样的查询加入他们:

SELECT a.CUST_ID, a.NAME, b.ADDRESS, c.ACCT_TYPE, c.ACCOUNT_OPEN_DT
FROM CUSTOMERS a
JOIN CUSTOMER_ADDRESSES b on a.CUST_ID = b.CUST_ID
JOIN CUSTOMER_ACCOUNTS c on a.CUST_ID = c.CUST_ID

显然,每一行都连接到每一行,正如预期的那样,我的输出如下所示:

ID  NAME            ADDRESS         ACCT_TYPE   ACCT_OPEN_DT
1   Burton Guster   123 Awesome St  TAP         1/1/1989
1   Burton Guster   123 Awesome St  PHARMA      1/1/2010
1   Burton Guster   456 Fake St     TAP         1/1/1989
1   Burton Guster   456 Fake St     PHARMA      1/1/2010

有没有办法让我得到这样的东西?:

ID  NAME            ADDRESS         ACCT_TYPE   ACCT_OPEN_DT
1   Burton Guster   123 Awesome St  TAP         1/1/1989
1   NULL            456 Fake St     PHARMA      1/1/2010

目标是对每个列进行分组,每列只返回一次不同的值。较大的集合将按客户ID分组。

谢谢

1 个答案:

答案 0 :(得分:0)

当然,可以做到,虽然做起来有点尴尬......: - )

您可以使用ROW_NUMBER()从每个表中单独获取每个客户的正在运行的行号。然后,您可以使用这些行号将数据组合在一起:

;WITH custCTE AS (
  SELECT CUST_ID, NAME, 1 AS CUST_ROW_N
  FROM CUSTOMERS
),
addrCTE AS (
  SELECT CUST_ID, ADDRESS, ROW_NUMBER() OVER(PARTITION BY CUST_ID ORDER BY ADDR_SEQ) CUST_ROW_N
  FROM CUSTOMER_ADDRESSES
),
acctCTE AS (
  SELECT CUST_ID, ACCT_TYPE, ACCOUNT_OPEN_DT, ROW_NUMBER() OVER(PARTITION BY CUST_ID ORDER BY ACCT_SEQ) CUST_ROW_N
  FROM CUSTOMER_ACCOUNTS
)
SELECT COALESCE(a.CUST_ID, b.CUST_ID, c.CUST_ID), a.NAME, b.ADDRESS, c.ACCT_TYPE, c.ACCOUNT_OPEN_DT
FROM custCTE a FULL JOIN addrCTE b ON
  a.CUST_ID = b.CUST_ID AND a.CUST_ROW_N = b.CUST_ROW_N FULL JOIN acctCTE c ON
  (b.CUST_ID = c.CUST_ID AND b.CUST_ROW_N = c.CUST_ROW_N) OR (a.CUST_ID = c.CUST_ID AND a.CUST_ROW_N = c.CUST_ROW_N)

这是SQLFiddle