我正在尝试清理一些按时间顺序排列的数据,以删除重复的时间顺序数据。
示例表:
+--------+------------+----------------+
| emp_id | department | effective_date |
+--------+------------+----------------+
| 1 | 50 | 2015-04-01 |
| 1 | 50 | 2015-05-22 |
| 1 | null | 2015-07-04 |
| 1 | null | 2015-07-24 |
| 1 | null | 2015-07-30 |
| 1 | 50 | 2015-09-07 |
| 1 | 50 | 2016-01-16 |
| 1 | null | 2016-04-23 |
| 2 | 60 | 2015-01-20 |
| 2 | 60 | 2015-11-22 |
| 2 | 60 | 2016-07-20 |
| 3 | 50 | 2015-04-02 |
| 3 | 50 | 2015-07-15 |
| 3 | 60 | 2016-01-25 |
+--------+------------+----------------+
如您所见,具有相同department
的同一个人可能拥有相同的部门,但可能有多个effective_date
。我想用查询来清理它,只有每个部门更改的第一个日期。但是,我不想删除有人从department
50
转到null
然后再回到50
的实例,因为这些是实际的位置变化。< / p>
示例输出:
+--------+------------+----------------+
| emp_id | department | effective_date |
+--------+------------+----------------+
| 1 | 50 | 2015-04-01 |
| 1 | null | 2015-07-04 |
| 1 | 50 | 2015-09-07 |
| 1 | null | 2016-04-23 |
| 2 | 60 | 2015-01-20 |
| 3 | 50 | 2015-04-02 |
| 3 | 60 | 2016-01-25 |
+--------+------------+----------------+
我怎样才能做到这一点?
答案 0 :(得分:1)
我的解决方案是
DECLARE @myTable TABLE (emp_id INT, department INT, effective_date DATE);
INSERT INTO @myTable VALUES
(1, 50 , '2015-04-01'),
(1, 50 , '2015-05-22'),
(1, null, '2015-07-04'),
(1, null, '2015-07-24'),
(1, null, '2015-07-30'),
(1, 50 , '2015-09-07'),
(1, 50 , '2016-01-16'),
(1, null, '2016-04-23'),
(2, 60 , '2015-01-20'),
(2, 60 , '2015-11-22'),
(2, 60 , '2016-07-20'),
(3, 50 , '2015-04-02'),
(3, 50 , '2015-07-15'),
(3, 60 , '2016-01-25')
;WITH T AS (
SELECT *,
RN = ROW_NUMBER() OVER(PARTITION BY emp_id ORDER BY effective_date)
FROM @myTable
)
SELECT T1.emp_id, T1.department, T1.effective_date
FROM
T T1
LEFT JOIN T T2 ON T1.emp_id = T2.emp_id AND T1.RN -1 = T2.RN
WHERE (CASE WHEN ISNULL(T1.department,'') = ISNULL(T2.department,'') THEN 1 ELSE 0 END) = 0
ORDER BY T1.emp_id, T1.RN
结果:
emp_id department effective_date
----------- ----------- --------------
1 50 2015-04-01
1 NULL 2015-07-04
1 50 2015-09-07
1 NULL 2016-04-23
2 60 2015-01-20
3 50 2015-04-02
3 60 2016-01-25
(7 row(s) affected)
删除重复值:
;WITH T AS (
SELECT *,
RN = ROW_NUMBER() OVER(PARTITION BY emp_id ORDER BY effective_date)
FROM @myTable
)
DELETE T1
FROM
T T1
LEFT JOIN T T2 ON T1.emp_id = T2.emp_id AND T1.RN -1 = T2.RN
WHERE ( CASE
WHEN ISNULL(T1.department,'') <> ISNULL(T2.department,'') THEN 1
ELSE 0 END ) = 0
where子句
的替代方案WHERE ( CASE WHEN T1.department <> T2.department
OR (T1.department IS NULL AND T2.department IS NOT NULL)
OR (T2.department IS NULL AND T1.department IS NOT NULL)
THEN 1 ELSE 0 END ) = 0
答案 1 :(得分:0)
这比预期更难:
declare @temp as table (emp_id int, department int,effective_date date)
insert into @temp
values
(1,50,'2015-04-01')
, (1,50,'2015-05-22')
, (1, null ,'2015-07-04')
, (1, null ,'2015-07-24')
, (1, null ,'2015-07-30')
, (1,50,'2015-09-07')
, (1,50,'2016-01-16')
, (1, null ,'2016-04-23')
, (2,60,'2015-01-20')
, (2,60,'2015-11-22')
, (2,60,'2016-07-20')
, (3,50,'2015-04-02')
, (3,50,'2015-07-15')
, (3,60,'2016-01-25')
;with cte as
(
--Please not I am changing null to -1 for comparison
select emp_id,isnull(department,-1) department,effective_date
,row_number() over (partition by emp_id order by effective_date) rn
from @temp
)
,cte2 as
(
--Compare to next record
select cte.*
,ctelast.emp_id cte2Emp
,ctelast.department cte2dept
,ctelast.effective_date cte2ED
,isSame = case when cte.department=ctelast.department then 1 else 0 end
from cte
join cte ctelast
on cte.emp_id=ctelast.emp_id and cte.rn = ctelast.rn-1
)
/*
Result of above:
emp_id department effective_date rn cte2Emp cte2dept cte2ED isSame
1 50 2015-04-01 1 1 50 2015-05-22 1
1 50 2015-05-22 2 1 -1 2015-07-04 0
1 -1 2015-07-04 3 1 -1 2015-07-24 1
1 -1 2015-07-24 4 1 -1 2015-07-30 1
1 -1 2015-07-30 5 1 50 2015-09-07 0
1 50 2015-09-07 6 1 50 2016-01-16 1
1 50 2016-01-16 7 1 -1 2016-04-23 0
2 60 2015-01-20 1 2 60 2015-11-22 1
2 60 2015-11-22 2 2 60 2016-07-20 1
3 50 2015-04-02 1 3 50 2015-07-15 1
3 50 2015-07-15 2 3 60 2016-01-25 0
*/
--Now you want both the first record and then any changes
select emp_id,department,effective_date from cte2 where rn=1
union all
select cte2emp,cte2dept,cte2.cte2ED from cte2 where isSame=0
order by 1,3
/*
result:
emp_id department effective_date
1 50 2015-04-01
1 -1 2015-07-04
1 50 2015-09-07
1 -1 2016-04-23
2 60 2015-01-20
3 50 2015-04-02
3 60 2016-01-25
*/