我试图改进我们的SQL Azure数据库性能,试图改变CURSOR的使用,而这是(正如每个人都告诉我的)要避免的事情。
我们的表格是关于GPS信息,具有id聚集索引的行和设备上的二级索引,时间戳和位置上的地理索引。
我试图计算一些统计数据,例如特定设备的最小速度(多普勒和计算),总距离,平均速度......等。
我没有选择统计数据,因为生产而无法更改表格或输出。
在我的SQL Azure DB上运行此内联tbl函数时,我遇到了明显的性能问题。
ALTER FUNCTION [dbo].[fn_logMetrics_3]
(
@p_device smallint,
@p_from dateTime,
@p_to dateTime,
@p_moveThresold int = 1
)
RETURNS TABLE
AS
RETURN
(
WITH CTE AS
(
SELECT
ROW_NUMBER() OVER(ORDER BY timestamp) AS RowNum,
Timestamp,
Location,
Alt,
Speed
FROM
LogEvents
WHERE
Device = @p_device
AND Timestamp >= @p_from
AND Timestamp <= @p_to),
CTE1 AS
(
SELECT
t1.Speed as Speed,
t1.Alt as Alt,
t2.Alt - t1.Alt as DeltaElevation,
t1.Timestamp as Time0,
t2.Timestamp as Time1,
DATEDIFF(second, t2.Timestamp, t1.Timestamp) as Duration,
t1.Location.STDistance(t2.Location) as Distance
FROM
CTE t1
INNER JOIN
CTE t2 ON t1.RowNum = t2.RowNum + 1),
CTE2 AS
(
SELECT
Speed, Alt,
DeltaElevation,
Time0, Time1,
Duration,
Distance,
CASE
WHEN Duration <> 0
THEN (Distance / Duration) * 3.6
ELSE NULL
END AS CSpeed,
CASE
WHEN DeltaElevation > 0
THEN DeltaElevation
ELSE NULL
END As PositiveAscent,
CASE
WHEN DeltaElevation < 0
THEN DeltaElevation
ELSE NULL
END As NegativeAscent,
CASE
WHEN Distance < @p_moveThresold
THEN Duration
ELSE NULL
END As StopTime,
CASE
WHEN Distance > @p_moveThresold
THEN Duration
ELSE NULL
END As MoveTime
FROM
CTE1 t1
)
SELECT
COUNT(*) as Count,
MIN(Speed) as HSpeedMin, MAX(Speed) as HSpeedMax,
AVG(Speed) as HSpeedAverage,
MIN(CSpeed) as CHSpeedMin, MAX(CSpeed) as CHSpeedMax,
AVG(CSpeed) as CHSpeedAverage,
SUM(Distance) as CumulativeDistance,
MAX(Alt) as AltMin, MIN(Alt) as AltMax,
SUM(PositiveAscent) as PositiveAscent,
SUM(NegativeAscent) as NegativeAscent,
SUM(StopTime) as StopTime,
SUM(MoveTime) as MoveTime
FROM
CTE2 t1
)
广泛的想法是
一切运行良好,直到最后一次SELECT调用,其中agregate函数(只有少数总和和平均值)使性能无效。
此查询针对具有4M行的表选择1500行,需要1500毫秒。
用
替换最后一个选择时SELECT ÇOUNT(*) as count FROM CTE2 t1
然后它只需要几毫秒..(根据SQL Studio统计数据,下降到2毫秒)。
带
SELECT
COUNT(*) as Count,
SUM(MoveTime) as MoveTime
它大约125ms
带
SELECT
COUNT(*) as Count,
SUM(StopTime) as StopTime,
SUM(MoveTime) as MoveTime
它大约250ms
就像每个聚合在所有行上的连续循环操作上运行,在同一个线程内并且没有并行化
有关信息,此功能的CURSOR版本(我在几年前写过)实际上至少运行了两次......
这个聚合有什么问题?如何优化它?
更新:
The query plans for SELECT COUNT(*) as Count
The query plans for the full Select with agregate
根据Joe C的回答,我在计划中引入了一个#tmp表并在其上执行聚合。结果大约快两倍,这是一个有趣的事实。