我有一个要求,我需要知道sum(value)
何时到达特定点并计算持续时间。下面是示例表。
create table sample (dt timestamp, value real);
insert into sample values
('2019-01-20 00:29:43 ',0.29)
,('2019-01-20 00:35:06 ',0.31)
,('2019-01-20 00:35:50 ',0.41)
,('2019-01-20 00:36:32 ',0.26)
,('2019-01-20 00:37:20 ',0.33)
,('2019-01-20 00:41:30 ',0.42)
,('2019-01-20 00:42:28 ',0.35)
,('2019-01-20 00:43:14 ',0.52)
,('2019-01-20 00:44:18 ',0.25);
现在,我的要求是计算以下各行的累积总和,以查看sum(value)
何时达到1.0以上。那可能只需要1行或n行。到达该行后,我需要计算当前行与sum(value)
达到1.0以上的行之间的时间差。
基本上我想要的输出是以下格式。
对于第一行,累积sum(value)
在第三行达到。
对于第二行,累积sum(value)
到达第四行,依此类推。
dt | value | sum(value)| time_at_sum(value)_1| Duration
---------------------+--------+------------------------------------------
2019-01-20 00:29:43| 0.29 | 1.01 | 2019-01-20 00:35:50 | 00:06:07
2019-01-20 00:35:06| 0.31 | 1.31 | 2019-01-20 00:37:20 | 00:02:14
2019-01-20 00:35:50| 0.41 | 1.00 | 2019-01-20 00:37:20 | 00:01:30
2019-01-20 00:36:32| 0.26 | 1.01 | 2019-01-20 00:41:30 | 00:04:58
2019-01-20 00:37:20| 0.33 | 1.10 | 2019-01-20 00:42:28 | 00:05:08
2019-01-20 00:41:30| 0.42 | 1.29 | 2019-01-20 00:43:14 | 00:01:44
2019-01-20 00:42:28| 0.35 | 1.12 | 2019-01-20 00:44:18 | 00:01:50
2019-01-20 00:43:14| 0.52 | NULL | - | -
2019-01-20 00:44:18| 0.25 | NULL | - | -
有人对如何满足上述要求有想法或指示吗?
答案 0 :(得分:4)
WITH tmp AS (
SELECT *
, sum(value) OVER (ORDER BY dt rows between current row and unbounded following) as forward_sum
FROM sample
ORDER BY dt)
SELECT t1.dt, t1.value
, (t2.value + t1.forward_sum - t2.forward_sum) as "sum(value)"
, t2.dt as "time_at_sum(value)_1"
, t2.dt - t1.dt as "Duration"
FROM tmp t1
LEFT JOIN LATERAL (
SELECT *
FROM tmp t
WHERE t1.forward_sum - t.forward_sum < 1
AND (t.value + t1.forward_sum - t.forward_sum) >= 0.999
ORDER BY dt DESC
LIMIT 1
) t2
ON TRUE
收益
| dt | value | sum(value) | time_at_sum(value)_1 | Duration |
|---------------------+-------+------------+----------------------+----------|
| 2019-01-20 00:29:43 | 0.29 | 1.01 | 2019-01-20 00:35:50 | 00:06:07 |
| 2019-01-20 00:35:06 | 0.31 | 1.31 | 2019-01-20 00:37:20 | 00:02:14 |
| 2019-01-20 00:35:50 | 0.41 | 1 | 2019-01-20 00:37:20 | 00:01:30 |
| 2019-01-20 00:36:32 | 0.26 | 1.01 | 2019-01-20 00:41:30 | 00:04:58 |
| 2019-01-20 00:37:20 | 0.33 | 1.1 | 2019-01-20 00:42:28 | 00:05:08 |
| 2019-01-20 00:41:30 | 0.42 | 1.29 | 2019-01-20 00:43:14 | 00:01:44 |
| 2019-01-20 00:42:28 | 0.35 | 1.12 | 2019-01-20 00:44:18 | 00:01:50 |
| 2019-01-20 00:43:14 | 0.52 | | | |
| 2019-01-20 00:44:18 | 0.25 | | | |
首先在value
列上计算累计和:
SELECT *
, sum(value) OVER (ORDER BY dt rows between current row and unbounded following) as forward_sum
FROM sample
ORDER BY dt
产生
| dt | value | forward_sum |
|---------------------+-------+-------------|
| 2019-01-20 00:29:43 | 0.29 | 3.14 |
| 2019-01-20 00:35:06 | 0.31 | 2.85 |
| 2019-01-20 00:35:50 | 0.41 | 2.54 |
| 2019-01-20 00:36:32 | 0.26 | 2.13 |
| 2019-01-20 00:37:20 | 0.33 | 1.87 |
| 2019-01-20 00:41:30 | 0.42 | 1.54 |
| 2019-01-20 00:42:28 | 0.35 | 1.12 |
| 2019-01-20 00:43:14 | 0.52 | 0.77 |
| 2019-01-20 00:44:18 | 0.25 | 0.25 |
请注意,从forward_sum
中减去两个值相当于value
上的部分和。
例如,
0.29 + 0.31 + 0.41 = 3.14 - 2.13
因此forward_sums
的差异将起重要作用,我们将这些差异与1进行比较。我们将要使用诸如:
t1.forward_sum - t.forward_sum < 1
让我们看看如果使用LEFT JOIN LATERAL会发生什么。了解LEFT JOIN LATERAL的关键是,在LATERAL联接has to be evaluated once for each row in the table on the left右侧的子查询:
WITH tmp AS (
SELECT *
, sum(value) OVER (ORDER BY dt rows between current row and unbounded following) as forward_sum
FROM sample
ORDER BY dt)
SELECT t1.*, t2.*
FROM tmp t1
LEFT JOIN LATERAL (
SELECT *
FROM tmp t
WHERE t1.forward_sum - t.forward_sum < 1
ORDER BY dt DESC
LIMIT 1
) t2
ON TRUE
收益
| dt | value | forward_sum | dt | value | forward_sum |
|---------------------+-------+-------------+---------------------+-------+-------------|
| 2019-01-20 00:29:43 | 0.29 | 3.14 | 2019-01-20 00:35:50 | 0.41 | 2.54 |
| 2019-01-20 00:35:06 | 0.31 | 2.85 | 2019-01-20 00:37:20 | 0.33 | 1.87 |
| 2019-01-20 00:35:50 | 0.41 | 2.54 | 2019-01-20 00:37:20 | 0.33 | 1.87 |
| 2019-01-20 00:36:32 | 0.26 | 2.13 | 2019-01-20 00:41:30 | 0.42 | 1.54 |
| 2019-01-20 00:37:20 | 0.33 | 1.87 | 2019-01-20 00:42:28 | 0.35 | 1.12 |
| 2019-01-20 00:41:30 | 0.42 | 1.54 | 2019-01-20 00:43:14 | 0.52 | 0.77 |
| 2019-01-20 00:42:28 | 0.35 | 1.12 | 2019-01-20 00:44:18 | 0.25 | 0.25 |
| 2019-01-20 00:43:14 | 0.52 | 0.77 | 2019-01-20 00:44:18 | 0.25 | 0.25 |
| 2019-01-20 00:44:18 | 0.25 | 0.25 | 2019-01-20 00:44:18 | 0.25 | 0.25 |
请注意,我们已经猜到了符合条件的联接条件
期望的日期。现在只需要组成正确的值表达式即可
获得所需的列sum(value)
,time_at_sum(value)_1
。
答案 1 :(得分:2)
一种有效解决此问题的方法是使用两个游标的过程解决方案:
一个explicit cursor和另一个implicit cursor of the FOR
loop:
CREATE OR REPLACE FUNCTION foo()
RETURNS TABLE (dt timestamp
, val real
, sum_value real
, time_at_sum timestamp
, duration interval) AS
$func$
DECLARE
_bound real := 1.0; -- your bound here
cur CURSOR FOR SELECT * FROM sample s ORDER BY s.dt; -- in chronological order
s sample; -- cursor row
BEGIN
OPEN cur;
FETCH cur INTO time_at_sum, sum_value; -- fetch first row into target
FOR dt, val IN -- primary pass over table
SELECT x.dt, x.value FROM sample x ORDER BY s.dt
LOOP
WHILE sum_value <= _bound LOOP
FETCH cur INTO s;
IF NOT FOUND THEN -- end of table
sum_value := NULL; time_at_sum := NULL;
EXIT; -- exits inner loop
END IF;
sum_value := sum_value + s.value;
END LOOP;
IF sum_value > _bound THEN -- to catch end-of-table
time_at_sum := s.dt;
END IF;
duration := time_at_sum - dt;
RETURN NEXT;
sum_value := sum_value - val; -- subtract previous row before moving on
END LOOP;
END
$func$ LANGUAGE plpgsql;
致电:
SELECT * FROM foo();
db <>提琴here
应该表现良好,因为它只需要对表进行2次扫描。
请注意,我按照您的描述要求实施了> _bound
,而不是像您的结果所示那样实施了>= _bound
。容易改变任何一种方式。
假定值列为NOT NULL
。
相关: