问题:在加入同一个表两次后,我得到了很多重复的行,并且查询时很长。
有两个表:
EVENTS:event_id,event_time,event_name TSERIES:data_time,data
event_time用于特定事件,data_time每分钟都有数据。
我想输出一个表格,其中我将有6列: event_id,event_time,data_time1,data1,data_time2,data2 data_time1 / data1是事件后的前2分钟(+ 0,+ 1), 和data_time2 / data2接下来的2分钟(+ 3,+ 4)
我的查询:
SELECT
*
FROM events
LEFT JOIN tseries ts1
ON ts1.data_time >= (events.event_time) AND ts1.data_time <= (events.event_time + time '00:01:00')
LEFT JOIN tseries ts2
ON ts2.data_time >= (events.event_time + time '00:02:00') AND ts2.data_time <= (events.event_time + time '00:03:00')
ORDER BY events.event_id
;
这个查询产生了这个表(我只包括时间数据),并且在连接同一个表之后,它会更加可怕。
event_time data_time1 data_time2
x x+0 x+2
x x+0 x+3
x x+1 x+2
x x+1 x+3
而我更喜欢这样的事情:
event_time data_time1 data_time2
x x+0 x+2
x x+1 x+3
或
event_time data_time1 data_time2
x x+0 null
x x+1 null
x null x+2
x null x+3
有什么想法吗?感谢您的任何帮助/回答:)
答案 0 :(得分:1)
一种方法是条件聚合。 。 。假设每个事件只需要一行:
SELECT e.*,
MAX(CASE WHEN ts.data_time >= e.event_time AND ts1.data_time <= e.event_time + time '00:01:00' THEN ts.data END) as data_1,
MAX(CASE WHEN ts.data_time >= e.event_time + time '00:02:00' AND ts2.data_time <= e.event_time + time '00:03:00' THEN ts.data END) as data_2
FROM events e LEFT JOIN
tseries ts
ON (ts.data_time >= e.event_time AND ts1.data_time <= e.event_time + time '00:01:00') OR
(ts.data_time >= e.event_time + time '00:02:00' AND ts2.data_time <= e.event_time + time '00:03:00')
GROUP BY e.event_id
ORDER BY e.event_id;
但是,这不适用于每个时间段内的多个匹配。
对于多行,一种方法是枚举每个事件和每个时间段的值。然后,您可以使用该序列号进行匹配。如果两个列表具有不同的长度,则以下使用FULL JOIN
:
SELECT COALESCE(ts1.event_id, ts2.event_id) as event_id,
ts1.data, ts2.data
FROM (SELECT e.event_id, ts1.data,
ROW_NUMBER() OVER (PARTITION BY e.event_id ORDER BY ts1.event_time) as seqnum
FROM events e JOIN
tseries ts1
ON ts1.data_time >= e.event_time AND
ts1.data_time <= e.event_time + time '00:01:00'
) ts1 FULL JOIN
(SELECT e.event_id, ts1.data,
ROW_NUMBER() OVER (PARTITION BY e.event_id ORDER BY ts1.event_time) as seqnum
FROM events e JOIN
tseries ts2
ON ts1.data_time >= e.event_time + time '00:02:00' AND
ts1.data_time <= e.event_time + time '00:03:00'
) ts2
ON ts1.event_id = ts2.event_id AND ts1.seqnum = ts2.seqnum
ORDER BY event_id;
注意:如果您想要event
中的其他字段,则可以使用:
SELECT e.*,
ts1.data, ts2.data
FROM events e LEFT JOIN
(SELECT e.event_id, ts1.data,
ROW_NUMBER() OVER (PARTITION BY e.event_id ORDER BY ts1.event_time) as seqnum
FROM events e JOIN
tseries ts1
ON ts1.data_time >= e.event_time AND
ts1.data_time <= e.event_time + time '00:01:00'
) ts1
ON ts1.event_id = e.event_id LEFT JOIN
(SELECT e.event_id, ts1.data,
ROW_NUMBER() OVER (PARTITION BY e.event_id ORDER BY ts1.event_time) as seqnum
FROM events e JOIN
tseries ts2
ON ts1.data_time >= e.event_time + time '00:02:00' AND
ts1.data_time <= e.event_time + time '00:03:00'
) ts2
ON e.event_id = ts2.event_id AND ts1.seqnum = ts2.seqnum
ORDER BY e.event_id;
答案 1 :(得分:0)
尝试
SELECT distinct
*
FROM events
LEFT JOIN tseries ts1
ON ts1.data_time >= (events.event_time) AND ts1.data_time <= (events.event_time + time '00:01:00')
LEFT JOIN tseries ts2
ON ts2.data_time >= (events.event_time + time '00:02:00') AND ts2.data_time <= (events.event_time + time '00:03:00')
ORDER BY events.event_id;