我正在处理包含预测的SQL
查询。通常应该在给定时期内每天进行预测。但是,有时会出现某个特定日期的预测缺失的情况,当发生这种情况时,我想根据当天的现有预测进行计算以估算预测,以及属于同一地区。
我已经整理了一个查询,但它确实很慢并占用了大量内存。谁能帮助我朝正确的方向发展?
declare @startDt date = :startDate
declare @endDt date = :endDate;
with AllDates as
(
select @startDt as dt
union all
select dateadd(day, 1, dt)
from AllDates
where dateadd(day, 1, dt) <= @endDt
)
select
dt,
m.date,
p.lp,
p.electricityArea,
maxCapacity,
sum(hour00_01) b,
SUM(maxCapacity) as c,
ISNULL( ISNULL(hour00_01, maxCapacity * ( SELECT sum(hour00_01)/sum(maxCapacity)
FROM tbl_p p2,
tbl_m m2
WHERE netArea = p.netArea
AND plantType = '2'
and date = dt
and m2.lp = p2.lp
AND (inputType = :forecastType) )),
maxCapacity * ( SELECT sum(hour00_01)/sum(maxCapacity)
FROM tbl_p p3,
tbl_m m3
WHERE electricityArea = p.electricityArea
AND plantType = '2'
and date = dt
and m3.lp = p3.lp
AND (inputType = :forecastType))) hour00_01,
ISNULL( ISNULL( hour01_02, maxCapacity * ( SELECT sum(hour01_02)/sum(maxCapacity)
FROM tbl_p p2,
tbl_m m2
WHERE netArea = p.netArea
AND plantType = '2'
and date = dt
and m2.lp = p2.lp
AND (inputType = :forecastType))),
maxCapacity * ( SELECT sum(hour01_02) / sum(maxCapacity)
FROM tbl_p p3,
tbl_m m3
WHERE electricityArea = p.electricityArea
AND plantType = '2'
and date = dt
and m3.lp = p3.lp
AND (inputType = :forecastType))) hour01_02,
**...[all 24 hours]...**
from
AllDates ad
cross join tbl_p p
left join tbl_m m
on p.lp = m.lp
and m.date = ad.dt
and m.inputType = :forecastType
where
p.plantType = '2'
AND agreementStart <= :startDate1
AND agreementEnd >= :endDate1
GROUP BY
dt,
m.date,
p.lp,
p.electricityArea,
maxCapacity,
p.netArea,
p.electricityArea,
hour00_01, hour01_02, hour02_03, hour03_04, hour04_05, hour05_06,
hour06_07, hour07_08, hour08_09, hour09_10, hour10_11, hour11_12,
hour12_13, hour13_14, hour14_15, hour15_16, hour16_17, hour17_18,
hour18_19, hour19_20, hour20_21, hour21_22, hour22_23, hour23_24
ORDER BY
p.lp,
dt option (maxrecursion 0)
知道如何优化它吗?
粘贴到评论中的表格结构与原始问题的编辑
tbl_p
COLUMN_NAME DATA_TYPE CHARACTER_MAXIMUM_LENGTH IS_NULLABLE
plantId int NULL NO
lp nchar 45 YES
unitId nchar 45 YES
plantType int NULL YES
electricityArea nchar 45 YES
netArea nchar 45 YES
maxCapacity int NULL YES
yearlyCapacity int NULL YES
numberOfPlants int NULL YES
manufacturer nchar 45 YES
groundLevel nchar 45 YES
altitudeLevel nchar 45 YES
updatedFromIp nchar 45 YES
xCoordinates nchar 45 YES
yCoordinates nchar 45 YES
plantStatus nchar 10 YES
agreementStart datetime NULL YES
agreementEnd datetime NULL YES
tbl_m is (with some removed columns to fit it here):
COLUMN_NAME DATA_TYPE CHARACTER_MAXIMUM_LENGTH IS_NULLABLE
id int NULL NO
lp nchar 45 YES
timeStampReturned datetime NULL YES
date date NULL YES
hour00_01 decimal NULL YES
hour01_02 decimal NULL YES
hour02_03 decimal NULL YES
...
hour21_22 decimal NULL YES
hour22_23 decimal NULL YES
inputType nchar 45 YES
答案 0 :(得分:1)
根据您的执行计划,您有一个表现不佳的简单原因。当您查看每个列hour00_01,hour00_02等的执行计划时,您执行2个表扫描.8%+ .1%比3.1%的哈希匹配+另一个.1%用于Index Spool。因此,总执行量的4.1%的成本重复24次,因为它是针对每一列完成的。而不是这样,你应该重构你的代码来生成一个CTE,临时表或表变量,它可以完成你需要的每一列的总和。例如,您的代码而不是单个子查询将是这样的。
SELECT SUM(hour00_01) / SUM(maxCapacity) AS hour00_01
,SUM(hour01_02) / SUM(maxCapacity) AS hour01_02
-- Plus other 22 hours --
FROM tbl_p p2
JOIN tbl_m m2
ON m2.lp = p2.lp
AND netArea = p.netArea
WHERE plantType = '2'
AND date = dt
AND ( inputType = 'Type' )
SELECT SUM(hour00_01) / SUM(maxCapacity) AS hour00_01
,SUM(hour01_02) / SUM(maxCapacity) AS hour01_02
-- Plus other 22 hours --
FROM tbl_p p2
JOIN tbl_m m2
ON m2.lp = p2.lp
AND electricityArea = p.electricityArea
WHERE plantType = '2'
AND date = dt
AND ( inputType = 'Type' )
如果你这样做,你将获得所有的总和,而不必多次击中表。优化查询时,减少到桌子的行程始终很重要。如果您一次性完成所有总和,那么这些额外的表扫描将相同,并且对于所有列只有一个散列连接而不是每列一个
此外,您应该考虑允许您进行行内计算的SUM() OVER (PARTITION BY
子句,而无需再访问该表。
答案 1 :(得分:0)
好的,我现在已经开始优化这个查询了。但是,我现在面临的问题是,不再计算缺失的预测。该查询通过将它们包含在NULL中来识别具有缺失预测的行。但是这些更新的查询不再计算这些。知道为什么吗?
declare @startDt date = '2014-12-19' declare @endDt date = '2014-12-19' ; with AllDates as
( select @startDt as dt union all select dateadd(day, 1, dt)
from AllDates where dateadd(day, 1, dt) <= @endDt )
select dt, m.date, p.lp, p.electricityArea, maxCapacity, SUM(maxCapacity) as c, p.netArea,
CASE hour00_01 WHEN null THEN maxCapacity*(sum(hour00_01)/sum(maxCapacity)) ELSE hour00_01 END hour00_01,
CASE hour01_02 WHEN null THEN maxCapacity*(sum(hour01_02)/sum(maxCapacity)) ELSE hour01_02 END hour01_02,
CASE hour02_03 WHEN null THEN maxCapacity*(sum(hour02_03)/sum(maxCapacity)) ELSE hour02_03 END hour02_03
from AllDates ad cross join tbl_p p LEFT JOIN tbl_m m
on p.lp = m.lp and m.date = ad.dt and m.inputType = 'TYPE' where p.plantType = '2' AND (agreementStart <= '2014-12-19' AND agreementEnd >= '2014-12-19')
GROUP BY
ad.dt,
m.date,
p.lp,
p.electricityArea,
maxCapacity,
p.netArea,
p.electricityArea,
hour00_01, hour01_02, hour02_03