在'PARTITION BY'上使用过滤条件

时间:2017-07-21 09:56:56

标签: sql netezza window-functions

我有两张桌子。一个是Reference表,用于排序优先级,一个是Customer表。 Reference表用于优先考虑Customer表中的每一列,为单个客户的各列提供不同的顺序。

参考表

---------------------------------------
| Priority |   Attribute |  sourceID  |
---------------------------------------
|   1      |     EMAIL   |      1     |
|   2      |     EMAIL   |      2     |
|   3      |     EMAIL   |      3     |
|   2      |     NAME    |      1     |
|   1      |     NAME    |      2     |
|   3      |     NAME    |      3     |
---------------------------------------

客户表

-----------------------------------------------------------------------
| CustomerID |  Name   |       Email        |  SourceID |     Date    |
-----------------------------------------------------------------------
|    1       |  John   |       NULL         |     1     |  03/01/2017 |
|    1       |  NULL   |   John@email.com   |     3     |  01/01/2017 |
|    1       |   J     |  J.Smith@email.com |     2     |  02/01/2017 |
-----------------------------------------------------------------------

结果

---------------------------------------------
| CustomerID   |  Name  |       Email       |
---------------------------------------------
|      1       |  John  | J.Smith@email.com |
---------------------------------------------

目前我正在使用以下查询来执行此操作:

SELECT DISTINCT
       FIRST_VALUE(c.Name IGNORE NULLS) 
           OVER (PARTITION BY p.customerID 
                 ORDER BY r.PRIORITY, c.DATE 
                 ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS EMAIL,
      FIRST_VALUE(c.Email IGNORE NULLS) 
           OVER (PARTITION BY c.customerID 
                 ORDER BY r.PRIORITY, c.DATE 
                 ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS EMAIL
FROM Customer c
  JOIN reference r ON c.sourceID = r.sourceID;

但是,这确实考虑了每列的不同属性。我需要按部分为每个分区添加某种过滤器。

任何人都可以协助我这样做吗?

1 个答案:

答案 0 :(得分:2)

一种方法是将客户的属性放在列中,然后重新组合它们:

SELECT DISTINCT customerId
       first_value(CASE WHEN ca.attribute = 'NAME' THEN ca.val end) OVER
           (PARTITION BY ca.customerId, attribute ORDER BY r.priority, ca.date) AS name,
       first_value(CASE WHEN ca.attribute = 'EMAIL' THEN ca.val END) OVER
           (PARTITION BY ca.customerId, attribute ORDER BY r.priority, ca.date) AS email
FROM ((SELECT customerId, 'NAME' AS attribute, name AS val, sourceId, date
       FROM customer c
      ) UNION ALL
      (SELECT customerId, 'EMAIL' AS attribute, email AS val, sourceId, date
       FROM customer c
      )
     ) ca JOIN
     reference r
     ON r.sourceId = ca.sourceId AND r.attribute = ca.attribute;

请注意,这会使用SELECT DISTINCT而不是GROUP BY。我不认为Netezza有first_value()聚合函数,所以这个构造解决了这个问题。