我有40个表如下所示,每个表包含3000万条记录。
表RawData
:PK(CaregoryID
,Time
)
CategoryID Time IsSampled Value
-----------------------------------------------------------
1 2012-07-01 00:00:00.000 0 -> 1 65.36347
1 2012-07-01 00:00:11.000 0 80.16729
1 2012-07-01 00:00:14.000 0 29.19716
1 2012-07-01 00:00:25.000 0 -> 1 7.05847
1 2012-07-01 00:00:36.000 0 -> 1 98.08257
1 2012-07-01 00:00:57.000 0 75.35524
1 2012-07-01 00:00:59.000 0 35.35524
截至目前,所有记录的IsSampled
列均为0。
我需要更新记录,以便对于每个CategoryID和每个分钟范围,具有Max(Value),Min(Value)和第一个记录的记录对于IsSampled
应该为1。 / p>
以下是我创建的程序查询,但运行时间太长。 (每张桌约2小时30米)
DECLARE @startRange datetime
DECLARE @endRange datetime
DECLARE @endTime datetime
SET @startRange = '2012-07-01 00:00:00.000'
SET @endTime = '2012-08-01 00:00:00.000'
WHILE (@startRange < @endTime)
BEGIN
SET @endRange = DATEADD(MI, 1, @startRange)
UPDATE r1
SET IsSampled = 1
FROM RawData AS r1
JOIN
(
SELECT r2.CategoryID,
MAX(Value) as MaxValue,
MIN(Value) as MinValue,
MIN([Time]) AS FirstTime
FROM RawData AS r2
WHERE @startRange <= [Time] AND [Time] < @endRange
GROUP BY CategoryID
) as samples
ON r1.CategoryID = samples.CategoryID
AND (r1.Value = samples.MaxValue
OR r1.Value = samples.MinValue
OR r1.[Time] = samples.FirstTime)
AND @startRange <= r1.[Time] AND r1.[Time] < @endRange
SET @startRange = DATEADD(MI, 1, @startRange)
END
有没有办法更快地更新这些表(可能是以非程序方式)?谢谢!
答案 0 :(得分:1)
我不确定这会是什么样的表现,但它是一种比现有方法更基于集合的方法:
declare @T table (CategoryID int not null,Time datetime2 not null,IsSampled bit not null,Value decimal(10,5) not null)
insert into @T (CategoryID,Time,IsSampled,Value) values
(1,'2012-07-01T00:00:00.000',0,65.36347),
(1,'2012-07-01T00:00:11.000',0,80.16729),
(1,'2012-07-01T00:00:14.000',0,29.19716),
(1,'2012-07-01T00:00:25.000',0,7.05847),
(1,'2012-07-01T00:00:36.000',0,98.08257),
(1,'2012-07-01T00:00:57.000',0,75.35524),
(1,'2012-07-01T00:00:59.000',0,35.35524)
;with BinnedValues as (
select CategoryID,Time,IsSampled,Value,DATEADD(minute,DATEDIFF(minute,0,Time),0) as TimeBin
from @T
), MinMax as (
select CategoryID,Time,IsSampled,Value,TimeBin,
ROW_NUMBER() OVER (PARTITION BY CategoryID, TimeBin ORDER BY Value) as MinPos,
ROW_NUMBER() OVER (PARTITION BY CategoryID, TimeBin ORDER BY Value desc) as MaxPos,
ROW_NUMBER() OVER (PARTITION BY CategoryID, TimeBin ORDER BY Time) as Earliest
from
BinnedValues
)
update MinMax set IsSampled = 1 where MinPos=1 or MaxPos=1 or Earliest=1
select * from @T
结果:
CategoryID Time IsSampled Value
----------- ---------------------- --------- ---------------------------------------
1 2012-07-01 00:00:00.00 1 65.36347
1 2012-07-01 00:00:11.00 0 80.16729
1 2012-07-01 00:00:14.00 0 29.19716
1 2012-07-01 00:00:25.00 1 7.05847
1 2012-07-01 00:00:36.00 1 98.08257
1 2012-07-01 00:00:57.00 0 75.35524
1 2012-07-01 00:00:59.00 0 35.35524
如果TimeBin
列可以作为计算列添加到表中并添加到适当的索引中,则可能会加快速度。
还应该注意,这将标记一个最大 3行作为采样 - 如果最早也是最小值或最大值,它将只标记一次(显然),但是下一个最接近的最小值或最大值将不会。此外,如果多行具有相同的Value
,即最小值或最大值,则会任意选择其中一行。
答案 1 :(得分:1)
您可以将循环中的更新重写为:
UPDATE r1
SET IsSampled = 1
FROM RawData r1
WHERE r1.Time >= @startRange and Time < @endRange
AND NOT EXISTS
(
select *
from RawData r2
where r2.CategoryID = r1.CategoryID
and r2.Time >= @startRange and r2.Time < @endRange
and (r2.Time < r1.Time or r2.Value < r1.Value or r2.Value > r1.Value)
)
要获得实际的性能提升,您需要一个关于时间列的索引。
答案 2 :(得分:0)
您好试试这个查询。
declare @T table (CategoryID int not null,Time datetime2 not null,IsSampled bit not null,Value decimal(10,5) not null)
insert into @T (CategoryID,Time,IsSampled,Value) values
(1,'2012-07-01T00:00:00.000',0,65.36347),
(1,'2012-07-01T00:00:11.000',0,80.16729),
(1,'2012-07-01T00:00:14.000',0,29.19716),
(1,'2012-07-01T00:00:25.000',0,7.05847),
(1,'2012-07-01T00:00:36.000',0,98.08257),
(1,'2012-07-01T00:00:57.000',0,75.35524),
(1,'2012-07-01T00:00:59.000',0,35.35524)
;WITH CTE as (SELECT CategoryID,CAST([Time] as Time) as time,IsSampled,Value FROM @T)
,CTE2 as (SELECT CategoryID,Max(time) mx,MIN(time) mn,'00:00:00.0000000' as start FROM CTE where time <> '00:00:00.0000000' Group by CategoryID)
update @T SET IsSampled=1
FROM CTE2 c inner join @T t on c.CategoryID = t.CategoryID and (CAST(t.[Time] as Time)=c.mx or CAST(t.[Time] as Time)=c.mn or CAST(t.[Time] as Time)=c.start)
select * from @T
答案 3 :(得分:0)
您好这是最新更新的查询。 检查查询的性能:
declare @T table (CategoryID int not null,Time datetime2 not null,IsSampled bit not null,Value decimal(10,5) not null)
insert into @T (CategoryID,Time,IsSampled,Value) values
(1,'2012-07-01T00:00:00.000',0,65.36347),
(1,'2012-07-01T00:00:11.000',0,80.16729),
(1,'2012-07-01T00:00:14.000',0,29.19716),
(1,'2012-07-01T00:00:25.000',0,7.05847),
(1,'2012-07-01T00:00:36.000',0,98.08257),
(1,'2012-07-01T00:00:57.000',0,75.35524),
(1,'2012-07-01T00:00:59.000',0,35.35524)
;WITH CTE as (SELECT CategoryID,Time,CAST([Time] as Time) as timepart,IsSampled,Value FROM @T)
--SELECT * FROM CTE
,CTE2 as (SELECT CategoryID,Max(value) mx,MIN(value) mn FROM CTE
where timepart <> '00:00:00.0000000' and Time <=DATEADD(MM,1,Time)
Group by CategoryID)
,CTE3 as (SELECT CategoryID,Max(value) mx,MIN(value) mn FROM CTE
where timepart = '00:00:00.0000000' and Time <=DATEADD(MM,1,Time)
Group by CategoryID)
update @T SET IsSampled=1
FROM @T t left join CTE2 c1
on (t.CategoryID = c1.CategoryID and (t.Value = c1.mn or t.Value =c1.mx))
left join CTE3 c3 on(t.CategoryID = c3.CategoryID and t.Value = c3.mx)
where (c1.CategoryID is not null or c3.CategoryID is not null)
select * from @T