我在Postgres有一张表如下:
# select * from p;
id | value
----+-------
1 | 100
2 |
3 |
4 |
5 |
6 |
7 |
8 | 200
9 |
(9 rows)
我想查询以使它看起来像这样:
# select * from p;
id | value | new_value
----+-------+----------
1 | 100 |
2 | | 100
3 | | 100
4 | | 100
5 | | 100
6 | | 100
7 | | 100
8 | 200 | 100
9 | | 200
(9 rows)
我已经可以在select中使用子查询执行此操作,但在我的实际数据中,我有20k或更多行,并且它会变得很慢。
这可以在窗口功能中执行吗?我喜欢使用lag(),但它似乎不支持IGNORE NULLS选项。
select id, value, lag(value, 1) over (order by id) as new_value from p;
id | value | new_value
----+-------+-----------
1 | 100 |
2 | | 100
3 | |
4 | |
5 | |
6 | |
7 | |
8 | 200 |
9 | | 200
(9 rows)
答案 0 :(得分:68)
我发现SQL Server的this answer也适用于Postgres。从来没有做过,我认为这种技术非常聪明。基本上,他通过在嵌套查询中使用case语句为窗口函数创建自定义分区,该嵌套查询在值不为空时递增总和,否则单独留下它。这允许人们用与前一个非空值相同的数字来描述每个空部分。这是查询:
SELECT
id, value, value_partition, first_value(value) over (partition by value_partition order by id)
FROM (
SELECT
id,
value,
sum(case when value is null then 0 else 1 end) over (order by id) as value_partition
FROM p
ORDER BY id ASC
) as q
结果:
id | value | value_partition | first_value
----+-------+-----------------+-------------
1 | 100 | 1 | 100
2 | | 1 | 100
3 | | 1 | 100
4 | | 1 | 100
5 | | 1 | 100
6 | | 1 | 100
7 | | 1 | 100
8 | 200 | 2 | 200
9 | | 2 | 200
(9 rows)
答案 1 :(得分:5)
您可以在Postgres中创建自定义聚合函数。以下是int
类型的示例:
CREATE FUNCTION coalesce_agg_sfunc(state int, value int) RETURNS int AS
$$
SELECT coalesce(value, state);
$$ LANGUAGE SQL;
CREATE AGGREGATE coalesce_agg(int) (
SFUNC = coalesce_agg_sfunc,
STYPE = int);
然后像往常一样查询。
SELECT *, coalesce_agg(b) over w, sum(b) over w FROM y
WINDOW w AS (ORDER BY a);
a b coalesce_agg sum
- - ------------ ---
a 0 0 0
b ∅ 0 0
c 2 2 2
d 3 3 5
e ∅ 3 5
f 5 5 10
(6 rows)
答案 2 :(得分:2)
嗯,我不能保证这是最有效的方式,但有效:
SELECT id, value, (
SELECT p2.value
FROM p p2
WHERE p2.value IS NOT NULL AND p2.id <= p1.id
ORDER BY p2.id DESC
LIMIT 1
) AS new_value
FROM p p1 ORDER BY id;
以下索引可以改进大型数据集的子查询:
CREATE INDEX idx_p_idvalue_nonnull ON p (id, value) WHERE value IS NOT NULL;
假设value
稀疏(例如有很多空值),它将运行良好。
答案 3 :(得分:0)
你可以使用LAST_VALUE和FILTER来达到你所需要的水平(至少在PG 9.4中)
WITH base AS (
SELECT 1 AS id , 100 AS val
UNION ALL
SELECT 2 AS id , null AS val
UNION ALL
SELECT 3 AS id , null AS val
UNION ALL
SELECT 4 AS id , null AS val
UNION ALL
SELECT 5 AS id , 200 AS val
UNION ALL
SELECT 6 AS id , null AS val
UNION ALL
SELECT 7 AS id , null AS val
)
SELECT id, val, last(val) FILTER (WHERE val IS NOT NULL) over(ORDER BY id ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING) new_val
FROM base
答案 4 :(得分:0)
在我的情况下,我需要在非交易日保持运行平衡,这只是周末,在非交易假期的情况下偶尔会有一个为期三天的周末
如果空天数很少,可以通过CASE语句和一系列LAG窗口函数来解决这个问题:
SELECT
CASE
WHEN balance IS NULL THEN
-- A non-null balance must be found within the first 3 preceding rows
CASE
WHEN LAG(balance, 1) OVER () IS NOT NULL
THEN LAG(balance, 1) OVER ()
WHEN LAG(balance, 2) OVER () IS NOT NULL
THEN LAG(d.balance, 2) OVER ()
WHEN LAG(balance, 3) OVER () IS NOT NULL
THEN LAG(balance, 3) OVER ()
END
ELSE balance
END
FROM daily_data;
对于无界问题不切实际,但对于细微差距是一个很好的解决方案。如有必要,只需添加更多“WHEN LAG(, x) ...”子句。我很幸运,我只需要用一列来完成这项工作,而且这个解决方案使我无法实现目标
答案 5 :(得分:0)
with p (id, value) as (
values (1, 100),
(2, null),
(3, null),
(4, null),
(5, null),
(6, null),
(7, null),
(8, 200),
(9, null))
select *
, (json_agg(value) filter (where value notnull) over (order by id) ->> -1)::int
from p
;
然后我们将使用带过滤器选项的聚合函数。
答案 6 :(得分:0)
另一种可能是建立一个总和:
WITH CTE_Data(Company, ValueDate, Amount)
AS(
SELECT 'Company', '2021-05-01', 1000 UNION
SELECT 'Company', '2021-05-02', 1250 UNION
SELECT 'Company', '2021-05-03', NULL UNION
SELECT 'Company', '2021-05-04', NULL UNION
SELECT 'Company', '2021-05-05', 7500 UNION
SELECT 'Company', '2021-05-06', NULL UNION
SELECT 'Company', '2021-05-07', 3200 UNION
SELECT 'Company', '2021-05-08', 3400 UNION
SELECT 'Company', '2021-05-09', NULL UNION
SELECT 'Company', '2021-05-10', 7800
)
SELECT
d.[Company]
,d.[ValueDate]
,d.[Amount]
,d.[Partition]
,SUM(d.[Amount]) OVER(PARTITION BY d.[Company], d.[Partition]) AS [Missing]
FROM(
SELECT
d.[Company]
,d.[ValueDate]
,d.[Amount]
,SUM(CASE WHEN d.[Amount] IS NULL THEN 0 ELSE 1 END) OVER (PARTITION BY d.[Company] ORDER BY d.[ValueDate]) AS [Partition]
FROM CTE_Data AS d
) AS d