如何优化此查询?它应该返回给定约束中小于40的值的百分比

时间:2015-10-20 17:16:27

标签: sql-server

USE SIMDB
GO

SELECT Count([Xa])*1.0/
    (SELECT Count(*) FROM [dbo].[Simulations]
        WHERE [dbo].[Simulations].[ExperimentID] IN (
            SELECT [dbo].[Parameters].[ExperimentID] FROM [dbo].[Parameters] 
                WHERE [dbo].[Parameters].[SensorError] = 0 AND [dbo].[Parameters].[ExogDEXCurve] <> 4
                AND [dbo].[Parameters].[ControlRange] = 0)) AS XA_PIR0
    FROM [dbo].[Simulations]
    WHERE [dbo].[Simulations].[ExperimentID] IN (
            SELECT [dbo].[Parameters].[ExperimentID] FROM [dbo].[Parameters] 
                WHERE [dbo].[Parameters].[SensorError] = 0 AND [dbo].[Parameters].[ExogDEXCurve] <> 4
                AND [dbo].[Parameters].[ControlRange] = 0)
    AND [dbo].[Simulations].[Xa] <= 40

GO

这是一个包含来自90k模拟的模拟结果的数据库。 Parameters表中的主键是[ExperimentID]Simulations表中的主键是[ExperimentID][CycleCount]的组合,用于唯一标识行在给定的模拟中。 Simulations表中的总行数约为7500万。 Simulations表也有21列,但事实上它不应该。所有指数都是聚集的。

这个查询需要大约23分钟才能在存放数据库的笔记本电脑上运行。检查执行计划,有两个聚集索引搜索,每个搜索占42%。这是预期的吗?我怎样才能加快速度?

sqlplan here:https://www.dropbox.com/s/gq1bl1wgmesh0bl/sqlplan.sqlplan?dl=0

3 个答案:

答案 0 :(得分:1)

在我看来,您可以省略第二个选择并执行以下操作:

select sum(1), sum(case when [Simulations].[Xa] <= 40 then 1 else 0 end)
FROM [dbo].[Simulations]
WHERE [dbo].[Simulations].[ExperimentID] IN (
    SELECT [dbo].[Parameters].[ExperimentID] 
    FROM [dbo].[Parameters] 
    WHERE [dbo].[Parameters].[SensorError] = 0
    AND [dbo].[Parameters].[ExogDEXCurve] <> 4
    AND [dbo].[Parameters].[ControlRange] = 0)

如果您使用ExperimentID + Xa作为包含字段创建Simulations的索引,也可能会有所帮助,这样可以减少要扫描的数据,因为该表似乎非常宽。

答案 1 :(得分:0)

在不知道数据形状的情况下,优化可能很困难。但是,这个查询可能更容易阅读和更快,因为它利用了基于MS SQL的集合引擎。 用表替换@simulationsTable和@parametersTable。我用它们来使事情变得更容易。

DECLARE @simulationsTable TABLE (
    ExperimentID INT
    ,Xa INT
    );
DECLARE @parametersTable TABLE (
    ExperimentID INT
    ,SensorError INT
    ,ExogDEXCurve INT
    ,ControlRange INT
    );

;WITH validParameters
AS (
    SELECT [ExperimentID]
    FROM @parametersTable
    WHERE [SensorError] = 0
        AND [ExogDEXCurve] <> 4
        AND [ControlRange] = 0
    )

SELECT COUNT([Xa]) * 1.0 / (sims.countOfSims) AS XA_PIR0
FROM @simulationsTable sim2
INNER JOIN validParameters p ON p.ExperimentID = sim2.ExperimentID
CROSS APPLY (
    SELECT COUNT(1) AS countOfSims
    FROM @simulationsTable sim1
    INNER JOIN validParameters p ON p.ExperimentID = sim1.ExperimentID
    ) sims
WHERE [Xa] <= 40
GO

此外,如果每行的sims数量相同,您可以提前计算并存储结果。

 DECLARE @countOfSims INT = (
        SELECT COUNT(1) AS countOfSims
        FROM @simulationsTable sim1
        CROSS APPLY (
            SELECT[ExperimentID]
            FROM @parametersTable
            WHERE [SensorError] = 0 AND [ExogDEXCurve] <> 4 AND [ControlRange] = 0
        ) params
        WHERE params.ExperimentID = sim1.ExperimentID
);

--And then use the result in your query.

;WITH validParameters
AS (
    SELECT[ExperimentID]
    FROM @parametersTable
    WHERE [SensorError] = 0 AND [ExogDEXCurve] <> 4 AND [ControlRange] = 0
)

SELECT COUNT([Xa])*1.0/ ( @countOfSims ) AS XA_PIR0
    FROM @simulationsTable sim2
    INNER JOIN validParameters p on p.ExperimentID = sim2.ExperimentID

    WHERE [Xa] <= 40

GO

答案 2 :(得分:0)

在dbo.Parameters表上使用EXISTS和index,可能会减少需要从中读取的数据量。

SELECT SUM(CASE WHEN S.Xa <= 40 THEN 1 ELSE 0 END) * 1.0 / SUM(1)
FROM dbo.Simulations AS S
WHERE EXISTS (
        SELECT 1
        FROM dbo.Parameters AS P
        WHERE P.ExperimentID = S.ExperimentID
            AND P.SensorError = 0
            AND P.ExogDEXCurve <> 4
            AND P.ControlRange = 0
        );

JamesZ提供明智的方法,将COUNT替换为SUM

另外,我会使用这些语句在表上创建几个索引。我相信他们会大大提高查询执行时间:

CREATE NONCLUSTERED INDEX idx_Parameters_SensorError_ExogDEXCurve_ControlRange_ExperimentID
    ON dbo.Parameters (SensorError, ExogDEXCurve, ControlRange, ExperimentID);

CREATE NONCLUSTERED INDEX idx_Simulations_ExperimentID_Xa
    ON dbo.Simulations (ExperimentID, Xa);

准确地说,这需要进行优化。我猜它需要花费大部分时间来执行。

Execution plan hint