我正在试图找出一种删除重叠时间记录的方法,但我无法找到一种简单而优雅的方法来保留所有但这些记录重叠的。这个问题与this one类似,但有一些差异。我们的表格如下:
╔════╤═══════════════════════════════════════╤══════════════════════════════════════╤════════╤═════════╗
║ id │ start_time │ end_time │ bar │ baz ║
╠════╪═══════════════════════════════════════╪══════════════════════════════════════╪════════╪═════════╣
║ 0 │ Mon, 18 Dec 2017 16:08:33 UTC +00:00 │ Mon, 18 Dec 2017 17:08:33 UTC +00:00 │ "ham" │ "eggs" ║
╟────┼───────────────────────────────────────┼──────────────────────────────────────┼────────┼─────────╢
║ 1 │ Mon, 18 Dec 2017 16:08:32 UTC +00:00 │ Mon, 18 Dec 2017 17:08:32 UTC +00:00 │ "ham" │ "eggs" ║
╟────┼───────────────────────────────────────┼──────────────────────────────────────┼────────┼─────────╢
║ 2 │ Mon, 18 Dec 2017 16:08:31 UTC +00:00 │ Mon, 18 Dec 2017 17:08:31 UTC +00:00 │ "spam" │ "bacon" ║
╟────┼───────────────────────────────────────┼──────────────────────────────────────┼────────┼─────────╢
║ 3 │ Mon, 18 Dec 2017 16:08:30 UTC +00:00 │ Mon, 18 Dec 2017 17:08:30 UTC +00:00 │ "ham" │ "eggs" ║
╚════╧═══════════════════════════════════════╧══════════════════════════════════════╧════════╧═════════╝
在上面的示例中,所有记录都有重叠的时间,其中<em>重叠只表示记录的start_time
和end_time
(包括)所定义的时间范围涵盖或扩展超过另一部分的记录。但是,对于此问题,我们不仅对那些具有重叠时间但也具有匹配的bar
和baz
列(上面的行0,1和3)的记录感兴趣。找到这些记录之后我们想要删除除最早的记录以外的所有记录,只留下上面的表格只记录2和3,因为记录2没有匹配的bar
和baz
列,3表示并且具有最早的开始和结束时间。
这是我到目前为止所拥有的:
delete from foos where id in (
select
foo_one.id
from
foos foo_one
where
user_id = 42
and exists (
select
1
from
foos foo_two
where
tsrange(foo_two.start_time::timestamp, foo_two.end_time::timestamp, '[]') &&
tsrange(foo_one.start_time::timestamp, foo_one.end_time::timestamp, '[]')
and
foo_one.bar = foo_two.bar
and
foo_one.baz = foo_two.baz
and
user_id = 42
and
foo_one.id != foo_two.id
)
);
感谢阅读!
更新:我找到了一个适合我的解决方案,基本上我可以将窗口函数row_number()
应用于按bar
和baz
字段分组的表格分区然后在WHERE
语句中添加DELETE
子句,该语句排除第一个条目(id
最小的条目)。
delete from foos where id in (
select id from (
select
foo_one.id,
row_number() over(partition by
bar,
baz
order by id asc)
from
foos foo_one
where
user_id = 42
and exists (
select
*
from
foos foo_two
where
tsrange(foo_two.start_time::timestamp,
foo_two.end_time::timestamp,
'[]') &&
tsrange(foo_one.start_time::timestamp,
foo_one.end_time::timestamp,
'[]')
and
foo_one.id != foo_two.id
)
) foos where row_number <> 1
);
答案 0 :(得分:1)
First of all, a small note: you really should give some more information. I understand that you probably don't want to show some real columns of your business, but in the way that it becomes a lot more hard to understand what you want to.
But, I am going to give some tips on that subject. I hope that helps you, and whoever has a similar problem.
Look these events:
<--a-->
<---- b ---->
<---- c ---->
<-- d -->
<---- e ---->
<------- f -------->
<--- g --->
If you define overlaps like the google definition: extend over so as to cover partly, then "b","d","e" and "f" overlaps partly the "c" event. If you define overlaps like the full event of covering, then "c" overlaps "d", and "f" overlaps "b" and "c" and "d".
Deleting groups could be a problem. In that previous case, what we should do? Should we delete "b", "c" and "d" and keep just with "f"? Should we sum their values? Take the average maybe? So, this is a decision to be made, column by column. The meaning of each column is very important. So, I can't help you with "bar" and "baz".
So, trying to guess what you really want to, I am creating a similar table of events with id, begin, end and user_id
create table events (
id integer,
user_id integer,
start_time timestamp,
end_time timestamp,
name varchar(100)
);
I am adding the example values
insert into events
( id, user_id, start_time, end_time, name ) values
( 1, 1000, timestamp('2017-10-09 01:00:00'),timestamp('2017-10-09 04:00:00'), 'a' );
insert into events
( id, user_id, start_time, end_time, name ) values
( 2, 1000, timestamp('2017-10-09 03:00:00'),timestamp('2017-10-09 15:00:00'), 'b' );
insert into events
( id, user_id, start_time, end_time, name ) values
( 3, 1000, timestamp('2017-10-09 07:00:00'),timestamp('2017-10-09 19:00:00'), 'c' );
insert into events
( id, user_id, start_time, end_time, name ) values
( 4, 1000, timestamp('2017-10-09 09:00:00'),timestamp('2017-10-09 17:00:00'), 'd' );
insert into events
( id, user_id, start_time, end_time, name ) values
( 5, 1000, timestamp('2017-10-09 17:00:00'),timestamp('2017-10-09 23:00:00'), 'e' );
insert into events
( id, user_id, start_time, end_time, name ) values
( 6, 1000, timestamp('2017-10-09 02:30:00'),timestamp('2017-10-09 22:00:00'), 'f' );
insert into events
( id, user_id, start_time, end_time, name ) values
( 7, 1000, timestamp('2017-10-09 17:30:00'),timestamp('2017-10-10 02:00:00'), 'g' );
Now, we can play with some nice queries:
List all the events that are full overlaps with another event:
select
# EVENT NAME
event_1.name as event_name,
# LIST EVENTS THAT THE EVENT OVERLAPS
GROUP_CONCAT(event_2.name) as overlaps_names
from events as event_1
inner join events as event_2
on
event_1.user_id = event_2.user_id
and
event_1.id != event_2.id
and
(
# START AFTER THE EVENT ONE
event_2.start_time >= event_1.start_time and
# ENDS BEFORE THE EVENT ONE
event_2.end_time <= event_1.end_time
)
group by
event_1.name
Result:
+------------+----------------+
| event_name | overlaps_names |
+------------+----------------+
| c | d |
| f | b,d,c |
+------------+----------------+
To detect the partial overlaps, you will need something like this:
select
# EVENT NAME
event_1.name as event_name,
# LIST EVENTS THAT THE EVENT OVERLAPS
GROUP_CONCAT(event_2.name) as overlaps_names
from events as event_1
inner join events as event_2
on
event_1.user_id = event_2.user_id
and
event_1.id != event_2.id
and
(
(
# START AFTER THE EVENT ONE
event_2.start_time >= event_1.start_time and
# ENDS BEFORE THE EVENT ONE
event_2.start_time <= event_1.end_time
) or
(
# START AFTER THE EVENT ONE
event_2.end_time >= event_1.start_time and
# ENDS BEFORE THE EVENT ONE
event_2.end_time <= event_1.end_time
)
)
group by
event_1.name
Result:
+------------+----------------+
| event_name | overlaps_names |
+------------+----------------+
| a | b,f |
| b | c,d,a |
| c | b,d,e,g |
| d | b,e |
| e | f,g,d,c |
| f | a,g,b,d,c,e |
| g | c,e,f |
+------------+----------------+
Of course, I am using a "group by" to make easier to read. That could be useful too if you want to sum or take the average of the overlaps data to update your parent data before the delete. Maybe that "group_concat" function does not exist into Postgres or have a different name. One "standard SQL" that you could test it is:
select
# EVENT NAME
event_1.name as event_name,
# LIST EVENTS THAT THE EVENT OVERLAPS
event_2.name as overlaps_name
from events as event_1
inner join events as event_2
on
event_1.user_id = event_2.user_id
and
event_1.id != event_2.id
and
(
# START AFTER THE EVENT ONE
event_2.start_time >= event_1.start_time and
# ENDS BEFORE THE EVENT ONE
event_2.end_time <= event_1.end_time
)
Result:
+------------+---------------+
| event_name | overlaps_name |
+------------+---------------+
| f | b |
| f | c |
| c | d |
| f | d |
+------------+---------------+
If you are going to try some math operations, keep in mind the risk of adding the value of the "c" and "d" data on "b" and adding their value again on "f", making the value of "f" wrong.
// should be
new f = old f + b + old c + d
new c = old c + b + d // unecessary if you are going to delete it
// very common mistake
new c = old c + b + d // unecessary but not wrong yet
new f = new c + b + d = ( old c + b + d ) + b + d // wrong!!
You can test all these queries and create your own into the same database online using this URL http://sqlfiddle.com/#!9/1d2455/19. But, keep in mind that it is Mysql, not Postgresql. But it is very good to test standard SQL.