我试图在 IMPALA SQL 的表中删除和合并按ID重叠的日期间隔分组。当重叠中的属性(相同的ID和相同的时间范围)不同时,就会发生掉落,而如果属性相同,则可以合并它们。
有关表格列的简要说明:
输入表的示例:
# Table: MY_TABLE
ID attributes StartDate EndDate
--------------------------------------------
1 cool 2017-01-01 2017-02-01
1 cool 2017-01-03 2017-04-01
1 handsome 2017-05-01 2017-08-31
1 beautiful 2017-08-01 2017-11-10
2 nice 2017-05-30 2017-05-31
2 nicer 2017-05-30 2017-08-31
3 something 2017-10-01 2017-11-01
3 something 2017-11-02 2017-12-25
3 something 2018-10-01 2018-11-01
3 other thing 2018-12-01 2018-12-25
所需的SQL查询应产生结果(每种情况下的注释):
# Desired Output:
ID attributes StartDate EndDate
------------------------------------------
1 cool 2017-01-01 2017-04-01 #-> merged 2 rows with same ID & attributes
1 handsome 2017-05-01 2017-07-31 #-> dropped overlap (20170801 to 20170831)
1 beautiful 2017-09-01 2017-11-10 #-> dropped overlap (20170801 to 20170831)
2 nicer 2017-06-01 2017-08-31 #-> dropped overlap (20170530 to 20170531)
3 something 2017-10-01 2017-11-01 #-> No overlap
3 something 2017-11-02 2017-12-25 #-> No overlap
3 something 2018-10-01 2018-11-01 #-> No overlap
3 other thing 2018-12-01 2018-12-25 #-> No overlap
什么Impala SQL查询将删除具有相同ID但属性不同的时间重叠的记录,同时合并具有相同ID和属性相同的时间重叠的记录?
一些相关问题:
ADDITIONALLY:哪个SQL查询可以让我删除那些重叠的东西?
# Dropped Overlaps table
ID attributes StartDate EndDate
------------------------------------------
1 handsome 2017-08-01 2017-08-31
1 beautiful 2017-08-01 2017-08-31
2 nice 2017-05-30 2017-05-31
2 nicer 2017-05-30 2017-05-31
我将展示到目前为止我尝试过的代码,但是到目前为止我所尝试的查询确实过于复杂,并且可能使上面提到的问题难以理解。