如何集中重复使用的复杂窗口查询,但每次都使用不同的分区

时间:2014-04-28 22:29:12

标签: php postgresql postgresql-9.1 plpgsql

我有一些范围折叠逻辑(基于http://wiki.postgresql.org/wiki/Range_aggregation),我想在各种不同的列分区上重复使用。

现在我正在使用PHP完成此操作。我有一个类似于以下的函数,它返回我想要运行的查询并替换相关的列:

function getIntervalsQueryForPartition($partitions = array())
{
// ... there is some validation logic here, not relevant to question

$cols = implode(', ', $partitions) . ' ';

return <<<SQL
SELECT $cols, MIN(start_date) start_date, MAX(end_date) end_date
FROM (
  SELECT $cols, start_date, end_date,
    MAX(new_start) OVER (
      PARTITION BY $cols
      ORDER BY start_date, end_date
    ) AS left_edge
  FROM (
    SELECT $cols, start_date, end_date,
    CASE WHEN GREATEST(
        MIN(start_date) OVER (
          PARTITION BY $cols
          ORDER BY start_date, end_date
          ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
        ),
        start_date - INTERVAL '90 days'
    ) < (
    MAX(end_date) OVER (
        PARTITION BY $cols
        ORDER BY start_date, end_date
        ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING
      )
    )
    THEN NULL
    ELSE start_date
    END AS new_start
    FROM product_activity
  ) s1
) s2
GROUP BY $cols, left_edge
SQL;
}

最终product_activity上有许多不同的列分区,我希望执行相同的窗口和聚合。显然,我不希望只是将查询复制并粘贴到具有略微不同分区的地方:因此上面的PHP函数。

如何在postgres中完全实现相同的抽象?这可以通过存储过程完成吗?我希望dba能够为不同的分区调用此查询,而无需复制并粘贴它,然后编辑指定列的所有7个位置。

1 个答案:

答案 0 :(得分:1)

您可以像在PHP中一样编写函数。由于特定的pl / pgSQL限制,最简单的选择是使用一个文本参数编写一个函数并返回setof记录。

create or replace function func (cols text)
returns setof record language plpgsql as $$
begin
    return query execute format (
        'SELECT %s, MIN(start_date) start_date, MAX(end_date) end_date
        FROM (
          SELECT %s, start_date, end_date,
            MAX(new_start) OVER (
              PARTITION BY %s
              ORDER BY start_date, end_date
            ) AS left_edge
          FROM (
            SELECT %s, start_date, end_date,
            CASE WHEN GREATEST(
                MIN(start_date) OVER (
                  PARTITION BY %s
                  ORDER BY start_date, end_date
                  ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
                ),
                start_date - INTERVAL ''90 days''
            ) < (
            MAX(end_date) OVER (
                PARTITION BY %s
                ORDER BY start_date, end_date
                ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING
              )
            )
            THEN NULL
            ELSE start_date
            END AS new_start
            FROM product_activity
          ) s1
        ) s2
        GROUP BY %s, left_edge',
        cols, cols, cols, cols, cols, cols, cols);
end $$;

此方法的唯一缺点是您调用函数的方式 - 它必须转换为显式复合类型。

select * from func('a1, a2')
as (a1 int, a2 int, start_date date, end_date date);

select * from func('a1, a3, a5')
as (a1 int, a3 int, a5 int, start_date date, end_date date);