所以让我们假设我们正在使用一个看起来(大致)像这样的sqlite表:
id date1 date2
+-------+----------+----------+
| foo |10/01/2010|01/01/2011|
+-------+----------+----------+
| bar |07/01/2010|10/01/2010|
+-------+----------+----------+
... ... ...
等...我试图以某种方式合并那些具有相同id的行以及date1和date2值的组合,这些值恰好指定了一个连续的范围,如果它没有分散在多个行上。换句话说,这个:
id date1 date2
+-------+----------+----------+
| foo |07/01/2010|10/01/2010|
+-------+----------+----------+
| foo |10/01/2010|01/01/2011|
+-------+----------+----------+
会变成:
id date1 date2
+-------+----------+----------+
| foo |07/01/2010|01/01/2011|
+-------+----------+----------+
等等,如果您有3个(或更多)条形图,每个条形图都映射到三个(或更多个)不同的,尽管是完全连续的范围。这样的查询是什么样的?到目前为止,我还没有找到任何合理的解决方案,尽管我自己并不是一个SQLista。
答案 0 :(得分:0)
我意识到sqlite不支持分析函数,但是......这是一个使用分析函数的潜在sql解决方案。我在Postgresql中运行它。
CREATE TABLE test(id VARCHAR(16), date1 DATE, date2 DATE);
INSERT INTO test VALUES('foo', '2011-01-01', '2011-01-15');
INSERT INTO test VALUES('bar', '2011-01-02', '2011-01-04');
INSERT INTO test VALUES('bar', '2011-01-05', '2011-01-10'); -- not contiguous
INSERT INTO test VALUES('foo', '2011-01-25', '2011-01-30');
INSERT INTO test VALUES('foo', '2011-01-15', '2011-01-18'); -- contiguous
INSERT INTO test VALUES('foo', '2011-01-28', '2011-01-31'); -- overlap
INSERT INTO test VALUES('bar', '2011-01-07', '2011-01-08'); -- subset chopped
postgres=# SELECT * FROM test ORDER BY id, date1;
id | date1 | date2
-----+------------+------------
bar | 2011-01-02 | 2011-01-04
bar | 2011-01-05 | 2011-01-10
bar | 2011-01-07 | 2011-01-08
foo | 2011-01-01 | 2011-01-15
foo | 2011-01-15 | 2011-01-18
foo | 2011-01-25 | 2011-01-30
foo | 2011-01-28 | 2011-01-31
(7 rows)
SELECT id
,MIN(date1) AS date1
,MAX(date2) AS date2
FROM ( SELECT id, date1, date2, previous_date1, previous_date2
,SUM( CASE WHEN date1 > previous_date2 THEN 1 ELSE 0 END ) OVER(PARTITION BY id ORDER BY id, date1) AS group_id
FROM ( SELECT id, date1, date2
,COALESCE( LAG(date1) OVER (PARTITION BY id ORDER BY id, date1), date1 ) previous_date1
,COALESCE( LAG(date2) OVER (PARTITION BY id ORDER BY id, date1), date2 ) previous_date2
FROM test
ORDER BY id, date1, date2
) AS x
) AS y
GROUP BY id, group_id
ORDER BY 1,2;
id | date1 | date2
-----+------------+------------
bar | 2011-01-02 | 2011-01-04
bar | 2011-01-05 | 2011-01-10
foo | 2011-01-01 | 2011-01-18
foo | 2011-01-25 | 2011-01-31
(4 rows)
说明
从内到外,首先按id和date对行进行排序,然后在每行中添加两个额外的列,以指示前一行的date1和date2值。
id | date1 | date2 | previous_date1 | previous_date2
-----+------------+------------+----------------+----------------
bar | 2011-01-02 | 2011-01-04 | 2011-01-02 | 2011-01-04
bar | 2011-01-05 | 2011-01-10 | 2011-01-02 | 2011-01-04
bar | 2011-01-07 | 2011-01-08 | 2011-01-05 | 2011-01-10
foo | 2011-01-01 | 2011-01-15 | 2011-01-01 | 2011-01-15
foo | 2011-01-15 | 2011-01-18 | 2011-01-01 | 2011-01-15
foo | 2011-01-25 | 2011-01-30 | 2011-01-15 | 2011-01-18
foo | 2011-01-28 | 2011-01-31 | 2011-01-25 | 2011-01-30
(7 rows)
然后标记具有重叠的每一行(在date1和previous_date1之间),在“id”分组中对这些标志求和,为我们提供了id的子分组。
id | date1 | date2 | previous_date1 | previous_date2 | flag | group_id
-----+------------+------------+----------------+----------------+------+----------
bar | 2011-01-02 | 2011-01-04 | 2011-01-02 | 2011-01-04 | 0 | 0
bar | 2011-01-05 | 2011-01-10 | 2011-01-02 | 2011-01-04 | 1 | 1
bar | 2011-01-07 | 2011-01-08 | 2011-01-05 | 2011-01-10 | 0 | 1
foo | 2011-01-01 | 2011-01-15 | 2011-01-01 | 2011-01-15 | 0 | 0
foo | 2011-01-15 | 2011-01-18 | 2011-01-01 | 2011-01-15 | 0 | 0
foo | 2011-01-25 | 2011-01-30 | 2011-01-15 | 2011-01-18 | 1 | 1
foo | 2011-01-28 | 2011-01-31 | 2011-01-25 | 2011-01-30 | 0 | 1
(7 rows)
现在我们可以按ID和生成的“group_id”进行分组。
也许有点疯狂。我不确定我是否真的想要使用这种解决方案,因为它可能很难测试,验证,记录,并且特别是在未来几年内维持。但我仍然认为使用sql可以完成的事情很简单。
答案 1 :(得分:0)
您是否需要使用(单个)SQL查询执行此操作?如果没有,我的建议是采用您选择的语言并编写一个一次性脚本来执行数据转换。