USE SIMDB
GO
SELECT Count([Xa])*1.0/
(SELECT Count(*) FROM [dbo].[Simulations]
WHERE [dbo].[Simulations].[ExperimentID] IN (
SELECT [dbo].[Parameters].[ExperimentID] FROM [dbo].[Parameters]
WHERE [dbo].[Parameters].[SensorError] = 0 AND [dbo].[Parameters].[ExogDEXCurve] <> 4
AND [dbo].[Parameters].[ControlRange] = 0)) AS XA_PIR0
FROM [dbo].[Simulations]
WHERE [dbo].[Simulations].[ExperimentID] IN (
SELECT [dbo].[Parameters].[ExperimentID] FROM [dbo].[Parameters]
WHERE [dbo].[Parameters].[SensorError] = 0 AND [dbo].[Parameters].[ExogDEXCurve] <> 4
AND [dbo].[Parameters].[ControlRange] = 0)
AND [dbo].[Simulations].[Xa] <= 40
GO
这是一个包含来自90k模拟的模拟结果的数据库。 Parameters
表中的主键是[ExperimentID]
,Simulations
表中的主键是[ExperimentID]
和[CycleCount]
的组合,用于唯一标识行在给定的模拟中。 Simulations
表中的总行数约为7500万。 Simulations
表也有21列,但事实上它不应该。所有指数都是聚集的。
这个查询需要大约23分钟才能在存放数据库的笔记本电脑上运行。检查执行计划,有两个聚集索引搜索,每个搜索占42%。这是预期的吗?我怎样才能加快速度?
sqlplan here:https://www.dropbox.com/s/gq1bl1wgmesh0bl/sqlplan.sqlplan?dl=0
答案 0 :(得分:1)
在我看来,您可以省略第二个选择并执行以下操作:
select sum(1), sum(case when [Simulations].[Xa] <= 40 then 1 else 0 end)
FROM [dbo].[Simulations]
WHERE [dbo].[Simulations].[ExperimentID] IN (
SELECT [dbo].[Parameters].[ExperimentID]
FROM [dbo].[Parameters]
WHERE [dbo].[Parameters].[SensorError] = 0
AND [dbo].[Parameters].[ExogDEXCurve] <> 4
AND [dbo].[Parameters].[ControlRange] = 0)
如果您使用ExperimentID + Xa作为包含字段创建Simulations的索引,也可能会有所帮助,这样可以减少要扫描的数据,因为该表似乎非常宽。
答案 1 :(得分:0)
在不知道数据形状的情况下,优化可能很困难。但是,这个查询可能更容易阅读和更快,因为它利用了基于MS SQL的集合引擎。 用表替换@simulationsTable和@parametersTable。我用它们来使事情变得更容易。
DECLARE @simulationsTable TABLE (
ExperimentID INT
,Xa INT
);
DECLARE @parametersTable TABLE (
ExperimentID INT
,SensorError INT
,ExogDEXCurve INT
,ControlRange INT
);
;WITH validParameters
AS (
SELECT [ExperimentID]
FROM @parametersTable
WHERE [SensorError] = 0
AND [ExogDEXCurve] <> 4
AND [ControlRange] = 0
)
SELECT COUNT([Xa]) * 1.0 / (sims.countOfSims) AS XA_PIR0
FROM @simulationsTable sim2
INNER JOIN validParameters p ON p.ExperimentID = sim2.ExperimentID
CROSS APPLY (
SELECT COUNT(1) AS countOfSims
FROM @simulationsTable sim1
INNER JOIN validParameters p ON p.ExperimentID = sim1.ExperimentID
) sims
WHERE [Xa] <= 40
GO
此外,如果每行的sims数量相同,您可以提前计算并存储结果。
DECLARE @countOfSims INT = (
SELECT COUNT(1) AS countOfSims
FROM @simulationsTable sim1
CROSS APPLY (
SELECT[ExperimentID]
FROM @parametersTable
WHERE [SensorError] = 0 AND [ExogDEXCurve] <> 4 AND [ControlRange] = 0
) params
WHERE params.ExperimentID = sim1.ExperimentID
);
--And then use the result in your query.
;WITH validParameters
AS (
SELECT[ExperimentID]
FROM @parametersTable
WHERE [SensorError] = 0 AND [ExogDEXCurve] <> 4 AND [ControlRange] = 0
)
SELECT COUNT([Xa])*1.0/ ( @countOfSims ) AS XA_PIR0
FROM @simulationsTable sim2
INNER JOIN validParameters p on p.ExperimentID = sim2.ExperimentID
WHERE [Xa] <= 40
GO
答案 2 :(得分:0)
在dbo.Parameters表上使用EXISTS和index,可能会减少需要从中读取的数据量。
SELECT SUM(CASE WHEN S.Xa <= 40 THEN 1 ELSE 0 END) * 1.0 / SUM(1)
FROM dbo.Simulations AS S
WHERE EXISTS (
SELECT 1
FROM dbo.Parameters AS P
WHERE P.ExperimentID = S.ExperimentID
AND P.SensorError = 0
AND P.ExogDEXCurve <> 4
AND P.ControlRange = 0
);
致JamesZ提供明智的方法,将COUNT
替换为SUM
。
另外,我会使用这些语句在表上创建几个索引。我相信他们会大大提高查询执行时间:
CREATE NONCLUSTERED INDEX idx_Parameters_SensorError_ExogDEXCurve_ControlRange_ExperimentID
ON dbo.Parameters (SensorError, ExogDEXCurve, ControlRange, ExperimentID);
CREATE NONCLUSTERED INDEX idx_Simulations_ExperimentID_Xa
ON dbo.Simulations (ExperimentID, Xa);
准确地说,这需要进行优化。我猜它需要花费大部分时间来执行。