我在t_resourcetable
中有一些脏资源使用记录,看起来像这样
resNo subres startdate enddate 1 2 2012-01-02 22:03:00.000 2012-01-03 00:00:00.000 1 2 2012-01-03 00:00:00.000 2012-01-04 00:00:00.000 1 2 2012-01-04 00:00:00.000 2012-01-04 16:23:00.000 1 3 2012-01-06 16:23:00.000 2012-01-06 22:23:00.000 2 2 2012-01-04 05:23:00.000 2012-01-06 16:23:00.000
我需要以这种方式合并那些脏行
resNo subres startdate enddate 1 2 2012-01-02 22:03:00.000 2012-01-04 16:23:00.000 1 3 2012-01-06 16:23:00.000 2012-01-06 22:23:00.000 2 2 2012-01-04 05:23:00.000 2012-01-06 16:23:00.000
这应该更新到同一个表。我有超过40k行,所以不能使用游标。请通过更优化的sql语句帮助我清理它。
提供的解决方案不会遇到像
这样的场景resNo subres startdate enddate 1 2 2012-01-02 22:03:00.000 2012-01-03 00:00:00.000 1 2 2012-01-03 00:00:00.000 2012-01-04 00:00:00.000 1 2 2012-01-04 00:00:00.000 2012-01-04 16:23:00.000 1 2 2012-01-14 10:09:00.000 2012-01-15 00:00:00.000 1 2 2012-01-15 00:00:00.000 2012-01-16 00:00:00.000 1 2 2012-01-16 00:00:00.000 2012-01-16 03:00:00.000 1 3 2012-01-06 16:23:00.000 2012-01-06 22:23:00.000 2 2 2012-01-04 05:23:00.000 2012-01-06 16:23:00.000
我需要以这种方式合并那些脏行
resNo subres startdate enddate 1 2 2012-01-02 22:03:00.000 2012-01-04 16:23:00.000 1 2 2012-01-14 10:09:00.000 2012-01-16 03:00:00.000 1 3 2012-01-06 16:23:00.000 2012-01-06 22:23:00.000 2 2 2012-01-04 05:23:00.000 2012-01-06 16:23:00.000
请帮我解决这个肮脏的数据问题。
答案 0 :(得分:6)
MERGE INTO t_resourcetable AS TARGET
USING (
SELECT
resNo, subres,
MIN(startdate) as startdate,
MAX(enddate) as enddate
FROM t_resourcetable
GROUP BY resNo, subres
) AS SOURCE
ON TARGET.resNo = SOURCE.resNo
AND TARGET.subres = SOURCE.subres
AND TARGET.startdate = SOURCE.startdate
-- Set enddate on the first record in the group
WHEN MATCHED THEN
UPDATE SET TARGET.enddate = SOURCE.enddate
-- Delete the remaining items
WHEN NOT MATCHED BY SOURCE THEN
DELETE;
编辑:尊重间隔中的差距:
MERGE INTO t_resourcetable AS TARGET
USING (
-- Find the first item in each interval group
SELECT
resNo, subres, startdate,
row_number() over (partition by resNo, subres order by startdate) as rn
FROM t_resourcetable t1
WHERE NOT EXISTS (
-- No other intervals that intersect this from behind
SELECT NULL
FROM t_resourcetable t2
WHERE t2.resNo = t1.resNo
AND t2.subres = t1.subres
AND t2.startdate < t1.startdate
AND t2.enddate >= t1.startdate
)
) AS SOURCE_start
INNER JOIN (
-- Find the last item in each interval group
SELECT
resNo, subres, enddate,
row_number() over (partition by resNo, subres order by startdate) as rn
FROM t_resourcetable t1
WHERE NOT EXISTS (
-- No other intervals that intersect this from ahead
SELECT NULL
FROM t_resourcetable t2
WHERE t2.resNo = t1.resNo
AND t2.subres = t1.subres
AND t2.startdate <= t1.enddate
AND t2.enddate > t1.enddate
)
) AS SOURCE_end
ON SOURCE_start.resNo = SOURCE_end.resNo
AND SOURCE_start.subres = SOURCE_end.subres
AND SOURCE_start.rn = SOURCE_end.rn -- Join by row number
ON TARGET.resNo = SOURCE_start.resNo
AND TARGET.subres = SOURCE_start.subres
AND TARGET.startdate = SOURCE_start.startdate
-- Set enddate on the first record in the group
WHEN MATCHED THEN
UPDATE SET TARGET.enddate = SOURCE_end.enddate
-- Delete the remaining items
WHEN NOT MATCHED BY SOURCE THEN
DELETE;
结果:
resNo subres startdate enddate
1 2 2012-01-02 22:03 2012-01-04 16:23
1 2 2012-01-14 10:09 2012-01-16 03:00
1 3 2012-01-06 16:23 2012-01-06 22:23
2 2 2012-01-04 05:23 2012-01-06 16:23
编辑:如果目标表上存在并发编辑的风险,您可能需要添加HOLDLOCK
提示。这样可以防止出现任何主键冲突错误,并且可以略微提高资源效率。 (谢谢乔伊):
MERGE INTO t_resourcetable WITH (HOLDLOCK) AS TARGET
...
答案 1 :(得分:0)
对于SQL Server 2005,您可以执行以下操作:
create table #temp
(
resNo int,
subres int,
enddate datetime,
primary key (resNo, subres)
)
-- Store the values you need for enddate in a temp table
insert into #temp
select resNo,
subres,
max(enddate) as enddate
from t_resourcetable
group by resNo, subres
-- Delete duplicates keeping the row with min startdate
delete T
from (
select row_number() over(partition by resNo, subres order by startdate) as rn
from t_resourcetable
) as T
where rn > 1
-- Set enddate where needed
update T set enddate = tmp.enddate
from t_resourcetable as T
inner join #temp as tmp
on T.resNo = tmp.resNo and
t.subres = tmp.subres
where T.enddate <> tmp.enddate
drop table #temp
答案 2 :(得分:0)
您可以先将结果存储在这样的临时表中:
DECLARE @tmp TABLE
(
resNo INT,
subres INT,
startdate DATETIME,
enddate DATETIME
)
INSERT @tmp
SELECT resNo, subres, MIN(startdate), MAX(enddate)
FROM t_resourcetable
GROUP BY resNo, subres
要更新t_resourcetable
表,您可以执行以下操作:
DELETE t_resourcetable
INSERT t_resourcetable
SELECT *
FROM @tmp
在交易中运行所有这些。
答案 3 :(得分:0)
我会创建一个临时表。 现在,您可以使用新的和已清理的数据填充临时表。 我想,你必须用resNo和subres组合一个键,然后选择min startdate和max enddate。
至少删除旧表中的所有数据,并用临时表中的数据填充它。