连接相同的表两次没有重复的行

时间:2017-01-13 11:45:00

标签: sql postgresql

问题:在加入同一个表两次后,我得到了很多重复的行,并且查询时很长。

有两个表:

EVENTS:event_id,event_time,event_name TSERIES:data_time,data

event_time用于特定事件,data_time每分钟都有数据。

我想输出一个表格,其中我将有6列: event_id,event_time,data_time1,data1,data_time2,data2 data_time1 / data1是事件后的前2分钟(+ 0,+ 1), 和data_time2 / data2接下来的2分钟(+ 3,+ 4)

我的查询:

SELECT 
*
FROM events
LEFT JOIN tseries ts1 
ON ts1.data_time >= (events.event_time) AND ts1.data_time <= (events.event_time + time '00:01:00')
LEFT JOIN tseries ts2 
ON ts2.data_time >= (events.event_time + time '00:02:00') AND ts2.data_time <= (events.event_time + time '00:03:00')
ORDER BY events.event_id
;

这个查询产生了这个表(我只包括时间数据),并且在连接同一个表之后,它会更加可怕。

event_time data_time1 data_time2
     x        x+0        x+2
     x        x+0        x+3
     x        x+1        x+2
     x        x+1        x+3

而我更喜欢这样的事情:

event_time data_time1 data_time2
     x        x+0        x+2
     x        x+1        x+3

event_time data_time1 data_time2
     x        x+0        null
     x        x+1        null
     x        null       x+2
     x        null       x+3

有什么想法吗?感谢您的任何帮助/回答:)

2 个答案:

答案 0 :(得分:1)

一种方法是条件聚合。 。 。假设每个事件只需要一行:

SELECT e.*,
       MAX(CASE WHEN ts.data_time >= e.event_time AND ts1.data_time <= e.event_time + time '00:01:00' THEN ts.data END) as data_1,
       MAX(CASE WHEN ts.data_time >= e.event_time + time '00:02:00' AND ts2.data_time <= e.event_time + time '00:03:00' THEN ts.data END) as data_2
FROM events e LEFT JOIN
     tseries ts
     ON (ts.data_time >= e.event_time AND ts1.data_time <= e.event_time + time '00:01:00') OR
        (ts.data_time >= e.event_time + time '00:02:00' AND ts2.data_time <= e.event_time + time '00:03:00')
GROUP BY e.event_id
ORDER BY e.event_id;

但是,这不适用于每个时间段内的多个匹配。

对于多行,一种方法是枚举每个事件和每个时间段的值。然后,您可以使用该序列号进行匹配。如果两个列表具有不同的长度,则以下使用FULL JOIN

SELECT COALESCE(ts1.event_id, ts2.event_id) as event_id,
       ts1.data, ts2.data
FROM (SELECT e.event_id, ts1.data,
             ROW_NUMBER() OVER (PARTITION BY e.event_id ORDER BY ts1.event_time) as seqnum
      FROM events e JOIN
           tseries ts1 
           ON ts1.data_time >= e.event_time AND
              ts1.data_time <= e.event_time + time '00:01:00'
     ) ts1 FULL JOIN
     (SELECT e.event_id, ts1.data,
             ROW_NUMBER() OVER (PARTITION BY e.event_id ORDER BY ts1.event_time) as seqnum
      FROM events e JOIN
           tseries ts2
           ON ts1.data_time >= e.event_time + time '00:02:00' AND 
              ts1.data_time <= e.event_time + time '00:03:00'
     ) ts2
     ON ts1.event_id = ts2.event_id AND ts1.seqnum = ts2.seqnum
ORDER BY event_id;

注意:如果您想要event中的其他字段,则可以使用:

SELECT e.*, 
       ts1.data, ts2.data
FROM events e LEFT JOIN
     (SELECT e.event_id, ts1.data,
             ROW_NUMBER() OVER (PARTITION BY e.event_id ORDER BY ts1.event_time) as seqnum
      FROM events e JOIN
           tseries ts1 
           ON ts1.data_time >= e.event_time AND
              ts1.data_time <= e.event_time + time '00:01:00'
     ) ts1
     ON ts1.event_id = e.event_id LEFT JOIN
     (SELECT e.event_id, ts1.data,
             ROW_NUMBER() OVER (PARTITION BY e.event_id ORDER BY ts1.event_time) as seqnum
      FROM events e JOIN
           tseries ts2
           ON ts1.data_time >= e.event_time + time '00:02:00' AND 
              ts1.data_time <= e.event_time + time '00:03:00'
     ) ts2
     ON e.event_id = ts2.event_id AND ts1.seqnum = ts2.seqnum
ORDER BY e.event_id;

答案 1 :(得分:0)

尝试

SELECT distinct
*
FROM events
LEFT JOIN tseries ts1 
ON ts1.data_time >= (events.event_time) AND ts1.data_time <= (events.event_time + time '00:01:00')
LEFT JOIN tseries ts2 
ON ts2.data_time >= (events.event_time + time '00:02:00') AND ts2.data_time <= (events.event_time + time '00:03:00')
ORDER BY events.event_id;