使用Amazon Redshift,我有一个包含列的表:streamers,start_time,end_time和total_streamers。 Start_time和end_time从午夜开始以秒为单位,流媒体指的是音乐流媒体,而total_streamers则是流媒体的总计。我想弄清楚在任何给定的start_time有多少个飘带。这是我得到的表。
streamers start_time end_time total_streamers
2 240 400 2
10 300 460 12
7 360 514 19
12 420 608 31
我遇到的问题是,一旦start_time超过之前的end_times之一,我就不再希望我的total_streamers中包含拖缆了。由于第一行end_time是400,因此一旦start_time大于400,就应该排除该行的2个拖缆。 这是我想要的结果。
streamers start_time end_time total_streamers
2 240 400 2
10 300 460 12
7 360 514 19
12 420 608 29
如果我能提供任何代码或澄清问题,请告诉我。提前致谢。
答案 0 :(得分:2)
一种方法是使用:
WITH cte AS (
SELECT s.n, SUM(tx.streamers) sm
FROM generate_series(1,1000) s(n)
LEFT JOIN tx ON s.n BETWEEN tx.start_time and tx.end_time
GROUP BY s.n
)
SELECT DISTINCT tx.*, cte.sm AS total_streamers
FROM tx
JOIN cte
ON cte.n =tx.start_time
ORDER BY start_time;
<强> DBFiddle Demo 强>
输出:
╔═══════════╦════════════╦══════════╦═══════╗
║ streamers ║ start_time ║ end_time ║ total ║
╠═══════════╬════════════╬══════════╬═══════╣
║ 2 ║ 240 ║ 400 ║ 2 ║
╠═══════════╬════════════╬══════════╬═══════╣
║ 10 ║ 300 ║ 460 ║ 12 ║
╠═══════════╬════════════╬══════════╬═══════╣
║ 7 ║ 360 ║ 514 ║ 19 ║
╠═══════════╬════════════╬══════════╬═══════╣
║ 12 ║ 420 ║ 608 ║ 29 ║
╚═══════════╩════════════╩══════════╩═══════╝
如果需要,你可以每秒获得价值:
SELECT s.n, SUM(tx.streamers) sm
FROM generate_series(1,1000) s(n)
LEFT JOIN tx ON s.n BETWEEN tx.start_time and tx.end_time
GROUP BY s.n
ORDER BY n;
修改强>
没有generate_series
:
WITH cte AS (
SELECT s.n, SUM(tx.streamers) sm
FROM (SELECT ROW_NUMBER() OVER(ORDER BY 1) AS n
FROM (VALUES (0),(1),(2),(3),(4),(5),(6),(7),(8),(9),(10)) y(n),
(VALUES (0),(1),(2),(3),(4),(5),(6),(7),(8),(9),(10)) x(n),
(VALUES (0),(1),(2),(3),(4),(5),(6),(7),(8),(9),(10)) z(n)) s
LEFT JOIN tx ON s.n BETWEEN tx.start_time and tx.end_time
GROUP BY s.n
)
SELECT DISTINCT tx.*, cte.sm
FROM tx
JOIN cte
ON cte.n =tx.start_time
ORDER BY start_time;
<强> DBFiddle Demo 强>
答案 1 :(得分:1)
newinfo
虽然有可能使用窗口函数执行此操作,但作为带子查询的派生列可能更容易。
答案 2 :(得分:0)
最终查询
WITH q AS
(
SELECT
t1.start_time,
t1.end_time,
t2.streamers
FROM t t1
LEFT JOIN t t2 ON t1.start_time BETWEEN t2.start_time AND t2.end_time
) SELECT
start_time,
end_time,
SUM(streamers)
FROM q
GROUP BY start_time, end_time
ORDER BY start_time;
要更好地理解此查询,请查看稍加修改的版本 下面的子查询q。
SELECT
t1.start_time AS original_start_time,
t2.start_time AS matching_start_time,
t2.end_time AS matching_end_time,
t2.streamers AS matching_streamers
FROM t t1
LEFT JOIN t t2 ON t1.start_time BETWEEN t2.start_time AND t2.end_time
ORDER BY t1.start_time,t2.start_time;
现在,子查询结果。
original_start_time matching_start_time matching_end_time matching_streamers
240 240 400 2
300 240 400 2
300 300 460 10
360 240 400 2
360 300 460 10
360 360 514 7
420 300 460 10
420 360 514 7
420 420 608 12
上述结果中的内容是什么?对于每一个新的&#34;流&#34; (例如300,460)子查询定位所有&#34;流&#34; (包括其自身)新的&#34;流&#34;开始时间介于开始和结束时间之间。例如,在300,我们有一个新的流(300,460)并且仍在运行(240,400),依此类推。
鉴于上述结果,我们所要做的就是为每个流匹配所有匹配流的总和。
修改强>
请注意,Redshfit不支持&#34; VALUES列表用作常量表&#34;已经在lad2025的答案中使用过。见http://docs.aws.amazon.com/redshift/latest/dg/c_unsupported-postgresql-features.html