我想知道在进行多表连接(每列,而不是每行)时,SSIS或T-SQL(SQL Server 2012)中是否有一种方法可以轻松返回非重复数据
我正在尝试对一堆数据进行非规范化/扁平化以转换为仓库,而我正在清理重复大量数据。我希望有一种汇总/汇总功能或我缺少的设计概念,可以在将多个表合并到一个目的地时帮助我。
示例
让我们举例说,我有三个表:CUSTOMERS,CUSTOMER_ADDRESSES和CUSTOMER_ACCOUNTS。他们和他们的数据看起来像这样:
客户
CUST_ID NAME 1 Burton Guster
CUSTOMER_ADDRESSES
CUST_ID ADDR_SEQ ADDRESS 1 1 123 Awesome St 1 2 456 Fake St
CUSTOMER_ACCOUNTS
CUST_ID ACCT_SEQ ACCT_TYPE ACCOUNT_OPEN_DT 1 1 TAP 1/1/1989 1 2 PHARMA 1/1/2010
我使用这样的查询加入他们:
SELECT a.CUST_ID, a.NAME, b.ADDRESS, c.ACCT_TYPE, c.ACCOUNT_OPEN_DT
FROM CUSTOMERS a
JOIN CUSTOMER_ADDRESSES b on a.CUST_ID = b.CUST_ID
JOIN CUSTOMER_ACCOUNTS c on a.CUST_ID = c.CUST_ID
显然,每一行都连接到每一行,正如预期的那样,我的输出如下所示:
ID NAME ADDRESS ACCT_TYPE ACCT_OPEN_DT
1 Burton Guster 123 Awesome St TAP 1/1/1989
1 Burton Guster 123 Awesome St PHARMA 1/1/2010
1 Burton Guster 456 Fake St TAP 1/1/1989
1 Burton Guster 456 Fake St PHARMA 1/1/2010
有没有办法让我得到这样的东西?:
ID NAME ADDRESS ACCT_TYPE ACCT_OPEN_DT
1 Burton Guster 123 Awesome St TAP 1/1/1989
1 NULL 456 Fake St PHARMA 1/1/2010
目标是对每个列进行分组,每列只返回一次不同的值。较大的集合将按客户ID分组。
谢谢
答案 0 :(得分:0)
当然,可以做到,虽然做起来有点尴尬......: - )
您可以使用ROW_NUMBER()
从每个表中单独获取每个客户的正在运行的行号。然后,您可以使用这些行号将数据组合在一起:
;WITH custCTE AS (
SELECT CUST_ID, NAME, 1 AS CUST_ROW_N
FROM CUSTOMERS
),
addrCTE AS (
SELECT CUST_ID, ADDRESS, ROW_NUMBER() OVER(PARTITION BY CUST_ID ORDER BY ADDR_SEQ) CUST_ROW_N
FROM CUSTOMER_ADDRESSES
),
acctCTE AS (
SELECT CUST_ID, ACCT_TYPE, ACCOUNT_OPEN_DT, ROW_NUMBER() OVER(PARTITION BY CUST_ID ORDER BY ACCT_SEQ) CUST_ROW_N
FROM CUSTOMER_ACCOUNTS
)
SELECT COALESCE(a.CUST_ID, b.CUST_ID, c.CUST_ID), a.NAME, b.ADDRESS, c.ACCT_TYPE, c.ACCOUNT_OPEN_DT
FROM custCTE a FULL JOIN addrCTE b ON
a.CUST_ID = b.CUST_ID AND a.CUST_ROW_N = b.CUST_ROW_N FULL JOIN acctCTE c ON
(b.CUST_ID = c.CUST_ID AND b.CUST_ROW_N = c.CUST_ROW_N) OR (a.CUST_ID = c.CUST_ID AND a.CUST_ROW_N = c.CUST_ROW_N)