不断从运行总计中减去

时间:2017-11-28 19:47:52

标签: sql amazon-redshift

使用Amazon Redshift,我有一个包含列的表:streamers,start_time,end_time和total_streamers。 Start_time和end_time从午夜开始以秒为单位,流媒体指的是音乐流媒体,而total_streamers则是流媒体的总计。我想弄清楚在任何给定的start_time有多少个飘带。这是我得到的表。

streamers   start_time  end_time    total_streamers
  2            240         400         2 
  10           300         460         12
  7            360         514         19
  12           420         608         31

我遇到的问题是,一旦start_time超过之前的end_times之一,我就不再希望我的total_streamers中包含拖缆了。由于第一行end_time是400,因此一旦start_time大于400,就应该排除该行的2个拖缆。 这是我想要的结果。

streamers   start_time  end_time    total_streamers
  2            240         400         2 
  10           300         460         12
  7            360         514         19
  12           420         608         29

如果我能提供任何代码或澄清问题,请告诉我。提前致谢。

3 个答案:

答案 0 :(得分:2)

一种方法是使用:

WITH cte AS (
  SELECT s.n, SUM(tx.streamers) sm
  FROM generate_series(1,1000) s(n)
  LEFT JOIN tx ON s.n BETWEEN tx.start_time and tx.end_time
  GROUP BY s.n
)
SELECT DISTINCT tx.*, cte.sm AS total_streamers
FROM tx
JOIN cte
  ON cte.n =tx.start_time
ORDER BY start_time;

<强> DBFiddle Demo

输出:

╔═══════════╦════════════╦══════════╦═══════╗
║ streamers ║ start_time ║ end_time ║ total ║
╠═══════════╬════════════╬══════════╬═══════╣
║ 2         ║ 240        ║ 400      ║ 2     ║
╠═══════════╬════════════╬══════════╬═══════╣
║ 10        ║ 300        ║ 460      ║ 12    ║
╠═══════════╬════════════╬══════════╬═══════╣
║ 7         ║ 360        ║ 514      ║ 19    ║
╠═══════════╬════════════╬══════════╬═══════╣
║ 12        ║ 420        ║ 608      ║ 29    ║
╚═══════════╩════════════╩══════════╩═══════╝

如果需要,你可以每秒获得价值:

SELECT s.n, SUM(tx.streamers) sm
FROM generate_series(1,1000) s(n)
LEFT JOIN tx ON s.n BETWEEN tx.start_time and tx.end_time
GROUP BY s.n
ORDER BY n;

修改

没有generate_series

WITH cte AS (
  SELECT s.n, SUM(tx.streamers) sm
  FROM (SELECT ROW_NUMBER() OVER(ORDER BY 1) AS n
        FROM (VALUES (0),(1),(2),(3),(4),(5),(6),(7),(8),(9),(10)) y(n),
             (VALUES (0),(1),(2),(3),(4),(5),(6),(7),(8),(9),(10)) x(n),
             (VALUES (0),(1),(2),(3),(4),(5),(6),(7),(8),(9),(10)) z(n)) s
  LEFT JOIN tx ON s.n BETWEEN tx.start_time and tx.end_time
  GROUP BY s.n
)
SELECT DISTINCT tx.*, cte.sm
FROM tx
JOIN cte
  ON cte.n =tx.start_time
ORDER BY start_time;

<强> DBFiddle Demo

答案 1 :(得分:1)

newinfo

虽然有可能使用窗口函数执行此操作,但作为带子查询的派生列可能更容易。

答案 2 :(得分:0)

最终查询

WITH q AS
(
    SELECT
      t1.start_time,
      t1.end_time,
      t2.streamers
    FROM t t1
      LEFT JOIN t t2 ON t1.start_time BETWEEN t2.start_time AND t2.end_time
) SELECT
    start_time,
    end_time,
    SUM(streamers)
  FROM q
  GROUP BY start_time, end_time
  ORDER BY start_time;

要更好地理解此查询,请查看稍加修改的版本 下面的子查询q。

SELECT
  t1.start_time AS original_start_time,
  t2.start_time AS matching_start_time,
  t2.end_time AS matching_end_time,
  t2.streamers AS matching_streamers
FROM t t1
  LEFT JOIN t t2 ON t1.start_time BETWEEN t2.start_time AND t2.end_time
ORDER BY t1.start_time,t2.start_time;

现在,子查询结果。

original_start_time matching_start_time matching_end_time   matching_streamers

240 240 400 2

300 240 400 2
300 300 460 10

360 240 400 2
360 300 460 10
360 360 514 7

420 300 460 10
420 360 514 7
420 420 608 12

上述结果中的内容是什么?对于每一个新的&#34;流&#34; (例如300,460)子查询定位所有&#34;流&#34; (包括其自身)新的&#34;流&#34;开始时间介于开始和结束时间之间。例如,在300,我们有一个新的流(300,460)并且仍在运行(240,400),依此类推。

鉴于上述结果,我们所要做的就是为每个流匹配所有匹配流的总和。

修改

请注意,Redshfit不支持&#34; VALUES列表用作常量表&#34;已经在lad2025的答案中使用过。见http://docs.aws.amazon.com/redshift/latest/dg/c_unsupported-postgresql-features.html