Question

使用Amazon Redshift，我有一个包含列的表：streamers，start_time，end_time和total_streamers。 Start_time和end_time从午夜开始以秒为单位，流媒体指的是音乐流媒体，而total_streamers则是流媒体的总计。我想弄清楚在任何给定的start_time有多少个飘带。这是我得到的表。

streamers   start_time  end_time    total_streamers
  2            240         400         2 
  10           300         460         12
  7            360         514         19
  12           420         608         31

我遇到的问题是，一旦start_time超过之前的end_times之一，我就不再希望我的total_streamers中包含拖缆了。由于第一行end_time是400，因此一旦start_time大于400，就应该排除该行的2个拖缆。这是我想要的结果。

streamers   start_time  end_time    total_streamers
  2            240         400         2 
  10           300         460         12
  7            360         514         19
  12           420         608         29

如果我能提供任何代码或澄清问题，请告诉我。提前致谢。

Answer 1

一种方法是使用：

WITH cte AS (
  SELECT s.n, SUM(tx.streamers) sm
  FROM generate_series(1,1000) s(n)
  LEFT JOIN tx ON s.n BETWEEN tx.start_time and tx.end_time
  GROUP BY s.n
)
SELECT DISTINCT tx.*, cte.sm AS total_streamers
FROM tx
JOIN cte
  ON cte.n =tx.start_time
ORDER BY start_time;

<强> DBFiddle Demo

输出：

╔═══════════╦════════════╦══════════╦═══════╗
║ streamers ║ start_time ║ end_time ║ total ║
╠═══════════╬════════════╬══════════╬═══════╣
║ 2         ║ 240        ║ 400      ║ 2     ║
╠═══════════╬════════════╬══════════╬═══════╣
║ 10        ║ 300        ║ 460      ║ 12    ║
╠═══════════╬════════════╬══════════╬═══════╣
║ 7         ║ 360        ║ 514      ║ 19    ║
╠═══════════╬════════════╬══════════╬═══════╣
║ 12        ║ 420        ║ 608      ║ 29    ║
╚═══════════╩════════════╩══════════╩═══════╝

如果需要，你可以每秒获得价值：

SELECT s.n, SUM(tx.streamers) sm
FROM generate_series(1,1000) s(n)
LEFT JOIN tx ON s.n BETWEEN tx.start_time and tx.end_time
GROUP BY s.n
ORDER BY n;

修改

没有generate_series：

WITH cte AS ( SELECT s.n, SUM(tx.streamers) sm FROM (SELECT ROW_NUMBER() OVER(ORDER BY 1) AS n FROM (VALUES (0),(1),(2),(3),(4),(5),(6),(7),(8),(9),(10)) y(n), (VALUES (0),(1),(2),(3),(4),(5),(6),(7),(8),(9),(10)) x(n), (VALUES (0),(1),(2),(3),(4),(5),(6),(7),(8),(9),(10)) z(n)) s LEFT JOIN tx ON s.n BETWEEN tx.start_time and tx.end_time GROUP BY s.n ) SELECT DISTINCT tx.*, cte.sm FROM tx JOIN cte ON cte.n =tx.start_time ORDER BY start_time;

<强> DBFiddle Demo

Answer 2

newinfo

虽然有可能使用窗口函数执行此操作，但作为带子查询的派生列可能更容易。

Answer 3

最终查询

WITH q AS
(
    SELECT
      t1.start_time,
      t1.end_time,
      t2.streamers
    FROM t t1
      LEFT JOIN t t2 ON t1.start_time BETWEEN t2.start_time AND t2.end_time
) SELECT
    start_time,
    end_time,
    SUM(streamers)
  FROM q
  GROUP BY start_time, end_time
  ORDER BY start_time;

要更好地理解此查询，请查看稍加修改的版本下面的子查询q。

SELECT
  t1.start_time AS original_start_time,
  t2.start_time AS matching_start_time,
  t2.end_time AS matching_end_time,
  t2.streamers AS matching_streamers
FROM t t1
  LEFT JOIN t t2 ON t1.start_time BETWEEN t2.start_time AND t2.end_time
ORDER BY t1.start_time,t2.start_time;

现在，子查询结果。

original_start_time matching_start_time matching_end_time   matching_streamers

240 240 400 2

300 240 400 2
300 300 460 10

360 240 400 2
360 300 460 10
360 360 514 7

420 300 460 10
420 360 514 7
420 420 608 12

上述结果中的内容是什么？对于每一个新的＆＃34;流＆＃34; （例如300,460）子查询定位所有＆＃34;流＆＃34; （包括其自身）新的＆＃34;流＆＃34;开始时间介于开始和结束时间之间。例如，在300，我们有一个新的流（300,460）并且仍在运行（240,400），依此类推。

鉴于上述结果，我们所要做的就是为每个流匹配所有匹配流的总和。

修改

请注意，Redshfit不支持＆＃34; VALUES列表用作常量表＆＃34;已经在lad2025的答案中使用过。见http://docs.aws.amazon.com/redshift/latest/dg/c_unsupported-postgresql-features.html

不断从运行总计中减去

3 个答案: