我有一些巨大的值和日期表,我想用运行长度编码进行压缩。对我来说最明显的方法是选择所有不同的值组合,以及最小和最大日期。这样做的问题是它会错过映射停止的任何实例,然后再次启动。
Id | Value1 | Value2 | Value3 | DataDate
------------------------------------------
01 | 1 | 2 | 3 | 2000-01-01
01 | 1 | 2 | 3 | 2000-01-02
01 | 1 | 2 | 3 | 2000-01-03
01 | 1 | 2 | 3 | 2000-01-04
01 | A | B | C | 2000-01-05
01 | A | B | C | 2000-01-06
01 | 1 | 2 | 3 | 2000-01-07
将以这种方式编码为
Id | Value1 | Value2 | Value3 | FromDate | ToDate
-----------------------------------------------------
01 | 1 | 2 | 3 | 2000-01-01| 2000-01-07
01 | A | B | C | 2000-01-05| 2000-01-06
这显然是错误的。
我想要的是一个查询,它将返回每组值的每组连续日期。
或者,如果我正在向后看这个屁股,任何其他建议将不胜感激。
答案 0 :(得分:1)
试试这个:
DECLARE @MyTable TABLE (
Id INT,
Value1 VARCHAR(10),
Value2 VARCHAR(10),
Value3 VARCHAR(10),
DataDate DATE
);
INSERT @MyTable
SELECT 01, '1', ' 2', '3', '2000-01-01' UNION ALL
SELECT 01, '1', ' 2', '3', '2000-01-02' UNION ALL
SELECT 01, '1', ' 2', '3', '2000-01-03' UNION ALL
SELECT 01, '1', ' 2', '3', '2000-01-04' UNION ALL
SELECT 01, 'A', ' B', 'C', '2000-01-05' UNION ALL
SELECT 01, 'A', ' B', 'C', '2000-01-06' UNION ALL
SELECT 01, '1', ' 2', '3', '2000-01-07'
SELECT Id, Value1, Value2, Value3,
MIN(DataDate) AS FromDate, MAX(DataDate) AS ToDate
FROM (
SELECT x.Id, x.Value1, x.Value2, x.Value3,
x.DataDate,
GroupNum =
DATEDIFF(DAY, 0, x.DataDate) -
ROW_NUMBER() OVER(PARTITION BY x.Id, x.Value1, x.Value2, x.Value3 ORDER BY x.DataDate)
FROM @MyTable x
) y
GROUP BY Id, Value1, Value2, Value3, GroupNum
结果:
Id Value1 Value2 Value3 FromDate ToDate
-- ------ ------ ------ ---------- ----------
1 1 2 3 2000-01-01 2000-01-04
1 1 2 3 2000-01-07 2000-01-07
1 A B C 2000-01-05 2000-01-06
答案 1 :(得分:0)
您可能想要使用窗口函数。尝试这样的事情:
select
id, value1, value2, value3,
from_date=update_date,
to_date=lead(update_date) over (partition by id order by update_date)
from (
select
t.*
,is_changed=
case when
value1 <> lag(value1) over (partition by id order by update_date) or
(lag(value1) over (partition by id order by update_date) is null and value1 is not null) or
value2 <> lag(value2) over (partition by id order by update_date) or
(lag(value2) over (partition by id order by update_date) is null and value2 is not null) or
value3 <> lag(value3) over (partition by id order by update_date) or
(lag(value3) over (partition by id order by update_date) is null and value3 is not null)
then 1 else 0 end
from test t
) t2
where is_changed = 1
order by id, update_date
请注意,此查询依赖于LAG()
函数以及其他两项内容:
>= from_date
和< to_date
测试值以使运行长度互斥请注意,我在测试中使用了以下示例数据:
create table test(id int, value1 varchar(3), value2 varchar(3), value3 varchar(3), update_date datetime)
insert into test values
(1, 'A', 'B', 'C', '1/1/2014'),
(1, 'A', 'B', 'C', '2/1/2014'),
(1, 'X', 'Y', 'Z', '3/1/2014'),
(1, 'A', 'B', 'C', '4/1/2014'),
(2, 'D', 'E', 'F', '1/1/2014'),
(2, 'D', 'E', 'F', '6/1/2014')
祝你好运!
答案 2 :(得分:0)
试试这个:
SELECT Id, Value1, Value2, Value3, MIN(DataDate) AS FromDate, MAX(DataDate) AS ToDate
FROM YourTable
GROUP BY Id, Value1, Value2, Value3