使用DATETIME计算的低效SQL查询。如何优化?

时间:2012-07-27 17:03:00

标签: sql sql-server

问题来自真实环境,其中production_plan表捕获订单标识和每行中的其他详细信息。产品开始生产时及生产后更新每一行 - 捕获事件的UTC时间。

有一个单独的表格temperatures可以在生产线上收集多个温度 - 定期,独立于任何内容,与UTC一起存储。

目标是提取每种产品生产的测量温度序列。 (然后应该处理这些temeratures,创建值的图表并将其附加到产品项目文档以供审计之用。)

在marc_s评论之后

更新。原始问题没有考虑任何索引。更新后的文本考虑以下内容。评论中提到的原始测量。

表和索引是按以下方式创建的:

CREATE TABLE production_plan (
        order_id nvarchar(50) NOT NULL,
        production_line uniqueidentifier NULL,
        prod_start DATETIME NULL,
        prod_end DATETIME NULL
);

-- About 31 000 rows inserted, ordered by order_id.
...

-- Clusteded index on ind_order_id.
CREATE CLUSTERED INDEX ind_order_id
ON production_plan (order_id ASC);

-- Non-clustered indices on the other columns.
CREATE INDEX ind_times
ON production_plan (production_line ASC, prod_start ASC, prod_end ASC);

------------------------------------------------------

-- There is actually more temperatures for one time (i.e. more
-- sensors). The UTC is the real time of the row insertion, hence
-- the primary key.
CREATE TABLE temperatures (
        UTC datetime PRIMARY KEY NOT NULL,
        production_line uniqueidentifier NULL,
        temperature_1 float NULL  
);

-- About 91 000 rows inserted ordered by UTC.
...

-- Clusteded index on UTC is created automatically 
-- because of the PRIMARY KEY. Indices on temperature(s)
-- do not make sense.

-- Non-clustered index for production_line
CREATE INDEX ind_pl
ON temperatures (production_line ASC);

-- The tables were created, records inserted, and the indices
-- created for less than 1 second (for the sample on my computer).

我们的想法是首先在production_line标识上加入表格,其次,温度UTC时间适合项目生产开始/结束的UTC时间:

-- About 45 000 rows in about 24 seconds when no indices were used.
-- The same took less than one second with the indices (for my data
-- and my computer).
SELECT pp.order_id,      -- not related to the problem 
       pp.prod_start,    -- UTC of the start of production
       pp.prod_end,      -- UTC of the end of production
       t.UTC,            -- UTC of the temperature measurement
       t.temperature_1   -- the measured temperature
  INTO result_table02
  FROM production_plan AS pp
       JOIN temperatures AS t
         ON pp.production_line = t.production_line
            AND t.UTC BETWEEN pp.prod_start
                          AND pp.prod_end
  ORDER BY t.UTC;

大约24秒的时间是不可接受的。很明显,索引是必要的。相同的操作花费不到1秒(Microsoft SQL Management Studio中结果选项卡下面的黄线中的时间)。

...然而

第二个问题仍然存在

由于温度测量不是太频繁,并且因为测量位置在开始生产时会稍微偏移,所以必须进行时间校正。换句话说,必须将两个偏移添加到时间范围边界。我已经以这样的查询结束了:

-- About 46 000 rows in about 9 minutes without indices.
-- It took about the same also with indices 
-- (8:50 instead of 9:00 or so).
DECLARE @offset_start INT;
SET @offset_start = -60  -- one minute = one sample before

DECLARE @offset_end INT;
SET @offset_end = +60    -- one minute = one sample after

SELECT pp.order_id,      -- not related to the problem 
       pp.prod_start,    -- UTC of the start of production
       pp.prod_end,      -- UTC of the end of production
       t.UTC,            -- UTC of the temperature measurement
       t.temperature_1   -- the measured temperature
  INTO result_table03
  FROM production_plan AS pp
       JOIN temperatures AS t
         ON pp.production_line = t.production_line
            AND t.UTC BETWEEN DATEADD(second, @offset_start, pp.prod_start)
                          AND DATEADD(second, @offset_end, pp.prod_end)
  ORDER BY t.UTC;

使用DATEADD()计算,大约需要9分钟 - 几乎可以独立决定是否创建索引。

更多地考虑如何解决问题,在我看来,校正的时间边界(具有附加偏移的UTC)需要它们自己的索引来进行有效处理。我想到了创建一个临时表。然后可以为其更正的列创建索引。之后再使用一个JOIN应该会有所帮助。然后可以删除该表。

临时表的基本概念是否正确?有没有其他技术可以做到这一点?

感谢您的建议。我会在介绍您建议的索引后更新时间结果。请解释预期改善的原因。我是初学者,在编写SQL解决方案时有亲身体验。

5 个答案:

答案 0 :(得分:2)

您通常可以通过以下方式优化查询:

  • 在您的表格中选择良好群集密钥 - 状态良好narrow, unique, static, ever-increasingINT IDENTITY是一个经典的好钥匙 - GUID是一个非常糟糕的例子(因为它们导致过多的索引碎片 - 阅读Kim Tripp的GUIDs as Primary and/or clustering key以获取更多细节)

  • 确保子表中的所有外键列都已编入索引,以便更快地执行JOIN和查找

  • 选择你真正需要的小栏目(你似乎做得很好)

  • 尝试覆盖查询,例如在所涉及的表上创建具有所有必要列的索引 - 直接作为索引列,或作为包含列(SQL Server 2008及以后版本)

  • 可能会添加额外的索引以加快范围查询,和/或帮助排序/排序

查看您的查询和表定义:

  • 我似乎没有看到任何主键 - 添加这些主键!

  • 您必须确保pp.production_line上有外键索引(假设t.production_line是另一个表的主键)

  • 您应该看看是否可以找到一个好的索引来处理t.UTC上的范围查询

  • 您应该检查是否有必要在production_plan2上创建包含所有列的索引(order_id, pp.prod_start, pp.prod_end

  • 您应该检查是否有必要在temperatures2上创建包含所有列的索引(UTC, temperature_1

更新:您可以通过从SSMS工具栏启用该选项来捕获实际执行计划:

enter image description here

或从Query > Include Actual Execution Plan

下的菜单中

答案 1 :(得分:1)

要尝试的事情:

CREATE INDEX ind_pl
    ON temperatures (production_line ASC, UTC);

将提供加入的覆盖索引。

使用非equijoins apply(sql server 2005+)可能会更快:

SELECT pp.order_id,      -- not related to the problem 
       pp.prod_start,    -- UTC of the start of production
       pp.prod_end,      -- UTC of the end of production
       t.UTC,            -- UTC of the temperature measurement
       t.temperature_1   -- the measured temperature
  INTO result_table02
  FROM production_plan AS pp
 CROSS APPLY
 (
   SELECT t1.utc, t1.temperature_1
     FROM temperatures AS t1
    WHERE t1.production_line = pp.production_line
      AND t1.UTC BETWEEN DATEADD(second, @offset_start, pp.prod_start)
                     AND DATEADD(second, @offset_end, pp.prod_end)
 ) t
 ORDER BY t.UTC;

如果这不成功,下一个选项是编写存储过程,它将确保每个表只读一次,通过声明两个游标,一个用于pp,一个用于t,并一次推进一个方向将匹配插入临时表时。这种技术可能非常复杂,因为存在n:m关系。但是,如果上述内容对您不起作用,我将很乐意为您提供帮助。

答案 2 :(得分:1)

我使用临时表尝试了以下解决方案:

-- UTC range expanded by the offsets -- temporary table used.
-- (Much better -- less than one second.)

DECLARE @offset_start INT;
SET @offset_start = -60  -- one minute = one sample before

DECLARE @offset_end INT;
SET @offset_end = +60    -- one minute = one sample after

-- Temporary table with the production_plan UTC range expanded.
SELECT production_line,
       order_id,
       prod_start,
       prod_end,
       DATEADD(second, @offset_start, prod_start) AS start,
       DATEADD(second, @offset_end, prod_end) AS bend
  INTO #pp     
  FROM production_plan;

CREATE INDEX ind_UTC
  ON #pp (production_line ASC, start ASC, bend ASC);

SELECT order_id,
       prod_start,
       prod_end,
       UTC,
       temperature_1
  INTO result_table06
  FROM #pp JOIN temperatures AS t
             ON #pp.production_line = t.production_line
                AND UTC BETWEEN #pp.start AND #pp.bend
  ORDER BY UTC;

DROP TABLE #pp;

CREATE CLUSTERED INDEX ind_UTC
  ON result_table06 (UTC ASC);

结果在不到一秒的时间内准备就绪(比较9分钟)。但我想听听你的批评。一个问题是,如果温度表增长到一个大表,效率会有多高。

答案 3 :(得分:1)

计算列可以帮助您  http://msdn.microsoft.com/en-us/library/ms189292%28v=sql.105%29.aspx

ALTER TABLE production_plan ADD 
        offset_start int NOT NULL CONSTRAINT DF__production_plan__offset_start DEFAULT 0,
        offset_end int NOT NULL CONSTRAINT DF__production_plan__offset_end DEFAULT 0,
        prod_start_UTC as CAST(DATEADD(second,offset_start,prod_start) as DATETIME) PERSISTED  NOT NULL ,
        prod_end_UTC as CAST(DATEADD(second,offset_end,prod_end) as DATETIME) PERSISTED  NOT NULL

-- or just
--ALTER TABLE production_plan ADD 
--        prod_start_UTC as CAST(DATEADD(second,-60,prod_start) as DATETIME) PERSISTED  NOT NULL ,
--        prod_end_UTC as CAST(DATEADD(second,60,prod_end) as DATETIME) PERSISTED  NOT NULL

IF  EXISTS (SELECT * FROM sys.indexes WHERE object_id = OBJECT_ID(N'[dbo].[temperatures]') AND name = N'ind_pl')
    DROP INDEX [ind_pl] ON [dbo].[temperatures] WITH ( ONLINE = OFF )

CREATE INDEX ind_times_UTC
ON production_plan (production_line ASC, prod_start_UTC ASC, prod_end_UTC ASC);

SELECT pp.order_id,      -- not related to the problem 
       pp.prod_start,    -- UTC of the start of production
       pp.prod_end,      -- UTC of the end of production
       t.UTC,            -- UTC of the temperature measurement
       t.temperature_1   -- the measured temperature
  INTO result_table05
  FROM production_plan AS pp
       JOIN temperatures AS t
         ON pp.production_line = t.production_line
            AND t.UTC BETWEEN pp.prod_start_UTC
                          AND pp.prod_end_UTC
ORDER BY t.UTC;

以及marc_s

提出的建议

答案 4 :(得分:0)

这是你的第二个问题。

我没有检查过这个的性能,但是你可以尝试通过用常量float的加法和减法替换它来跳过DATEADD函数。

如果您想添加一秒,可以使用:

select getdate()+1.000/(24.00*60.00)

或者使用常数:

select getdate()+0.000694444

如您所见,添加1(一)将恰好添加1天。 所以这不会是60秒,但在这种情况下也许并不重要?