我有两张桌子。一个是Reference
表,用于排序优先级,一个是Customer
表。 Reference
表用于优先考虑Customer
表中的每一列,为单个客户的各列提供不同的顺序。
参考表:
---------------------------------------
| Priority | Attribute | sourceID |
---------------------------------------
| 1 | EMAIL | 1 |
| 2 | EMAIL | 2 |
| 3 | EMAIL | 3 |
| 2 | NAME | 1 |
| 1 | NAME | 2 |
| 3 | NAME | 3 |
---------------------------------------
客户表:
-----------------------------------------------------------------------
| CustomerID | Name | Email | SourceID | Date |
-----------------------------------------------------------------------
| 1 | John | NULL | 1 | 03/01/2017 |
| 1 | NULL | John@email.com | 3 | 01/01/2017 |
| 1 | J | J.Smith@email.com | 2 | 02/01/2017 |
-----------------------------------------------------------------------
结果:
---------------------------------------------
| CustomerID | Name | Email |
---------------------------------------------
| 1 | John | J.Smith@email.com |
---------------------------------------------
目前我正在使用以下查询来执行此操作:
SELECT DISTINCT
FIRST_VALUE(c.Name IGNORE NULLS)
OVER (PARTITION BY p.customerID
ORDER BY r.PRIORITY, c.DATE
ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS EMAIL,
FIRST_VALUE(c.Email IGNORE NULLS)
OVER (PARTITION BY c.customerID
ORDER BY r.PRIORITY, c.DATE
ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS EMAIL
FROM Customer c
JOIN reference r ON c.sourceID = r.sourceID;
但是,这确实考虑了每列的不同属性。我需要按部分为每个分区添加某种过滤器。
任何人都可以协助我这样做吗?
答案 0 :(得分:2)
一种方法是将客户的属性放在列中,然后重新组合它们:
SELECT DISTINCT customerId
first_value(CASE WHEN ca.attribute = 'NAME' THEN ca.val end) OVER
(PARTITION BY ca.customerId, attribute ORDER BY r.priority, ca.date) AS name,
first_value(CASE WHEN ca.attribute = 'EMAIL' THEN ca.val END) OVER
(PARTITION BY ca.customerId, attribute ORDER BY r.priority, ca.date) AS email
FROM ((SELECT customerId, 'NAME' AS attribute, name AS val, sourceId, date
FROM customer c
) UNION ALL
(SELECT customerId, 'EMAIL' AS attribute, email AS val, sourceId, date
FROM customer c
)
) ca JOIN
reference r
ON r.sourceId = ca.sourceId AND r.attribute = ca.attribute;
请注意,这会使用SELECT DISTINCT
而不是GROUP BY
。我不认为Netezza有first_value()
聚合函数,所以这个构造解决了这个问题。