问题来自真实环境,其中production_plan
表捕获订单标识和每行中的其他详细信息。产品开始生产时及生产后更新每一行 - 捕获事件的UTC时间。
有一个单独的表格temperatures
可以在生产线上收集多个温度 - 定期,独立于任何内容,与UTC一起存储。
目标是提取每种产品生产的测量温度序列。 (然后应该处理这些temeratures,创建值的图表并将其附加到产品项目文档以供审计之用。)
在marc_s评论之后更新。原始问题没有考虑任何索引。更新后的文本考虑以下内容。评论中提到的原始测量。
表和索引是按以下方式创建的:
CREATE TABLE production_plan (
order_id nvarchar(50) NOT NULL,
production_line uniqueidentifier NULL,
prod_start DATETIME NULL,
prod_end DATETIME NULL
);
-- About 31 000 rows inserted, ordered by order_id.
...
-- Clusteded index on ind_order_id.
CREATE CLUSTERED INDEX ind_order_id
ON production_plan (order_id ASC);
-- Non-clustered indices on the other columns.
CREATE INDEX ind_times
ON production_plan (production_line ASC, prod_start ASC, prod_end ASC);
------------------------------------------------------
-- There is actually more temperatures for one time (i.e. more
-- sensors). The UTC is the real time of the row insertion, hence
-- the primary key.
CREATE TABLE temperatures (
UTC datetime PRIMARY KEY NOT NULL,
production_line uniqueidentifier NULL,
temperature_1 float NULL
);
-- About 91 000 rows inserted ordered by UTC.
...
-- Clusteded index on UTC is created automatically
-- because of the PRIMARY KEY. Indices on temperature(s)
-- do not make sense.
-- Non-clustered index for production_line
CREATE INDEX ind_pl
ON temperatures (production_line ASC);
-- The tables were created, records inserted, and the indices
-- created for less than 1 second (for the sample on my computer).
我们的想法是首先在production_line
标识上加入表格,其次,温度UTC时间适合项目生产开始/结束的UTC时间:
-- About 45 000 rows in about 24 seconds when no indices were used.
-- The same took less than one second with the indices (for my data
-- and my computer).
SELECT pp.order_id, -- not related to the problem
pp.prod_start, -- UTC of the start of production
pp.prod_end, -- UTC of the end of production
t.UTC, -- UTC of the temperature measurement
t.temperature_1 -- the measured temperature
INTO result_table02
FROM production_plan AS pp
JOIN temperatures AS t
ON pp.production_line = t.production_line
AND t.UTC BETWEEN pp.prod_start
AND pp.prod_end
ORDER BY t.UTC;
大约24秒的时间是不可接受的。很明显,索引是必要的。相同的操作花费不到1秒(Microsoft SQL Management Studio中结果选项卡下面的黄线中的时间)。
...然而
第二个问题仍然存在
由于温度测量不是太频繁,并且因为测量位置在开始生产时会稍微偏移,所以必须进行时间校正。换句话说,必须将两个偏移添加到时间范围边界。我已经以这样的查询结束了:
-- About 46 000 rows in about 9 minutes without indices.
-- It took about the same also with indices
-- (8:50 instead of 9:00 or so).
DECLARE @offset_start INT;
SET @offset_start = -60 -- one minute = one sample before
DECLARE @offset_end INT;
SET @offset_end = +60 -- one minute = one sample after
SELECT pp.order_id, -- not related to the problem
pp.prod_start, -- UTC of the start of production
pp.prod_end, -- UTC of the end of production
t.UTC, -- UTC of the temperature measurement
t.temperature_1 -- the measured temperature
INTO result_table03
FROM production_plan AS pp
JOIN temperatures AS t
ON pp.production_line = t.production_line
AND t.UTC BETWEEN DATEADD(second, @offset_start, pp.prod_start)
AND DATEADD(second, @offset_end, pp.prod_end)
ORDER BY t.UTC;
使用DATEADD()
计算,大约需要9分钟 - 几乎可以独立决定是否创建索引。
更多地考虑如何解决问题,在我看来,校正的时间边界(具有附加偏移的UTC)需要它们自己的索引来进行有效处理。我想到了创建一个临时表。然后可以为其更正的列创建索引。之后再使用一个JOIN应该会有所帮助。然后可以删除该表。
临时表的基本概念是否正确?有没有其他技术可以做到这一点?
感谢您的建议。我会在介绍您建议的索引后更新时间结果。请解释预期改善的原因。我是初学者,在编写SQL解决方案时有亲身体验。
答案 0 :(得分:2)
您通常可以通过以下方式优化查询:
在您的表格中选择良好群集密钥 - 状态良好narrow, unique, static, ever-increasing
。 INT IDENTITY
是一个经典的好钥匙 - GUID是一个非常糟糕的例子(因为它们导致过多的索引碎片 - 阅读Kim Tripp的GUIDs as Primary and/or clustering key以获取更多细节)
确保子表中的所有外键列都已编入索引,以便更快地执行JOIN和查找
选择你真正需要的小栏目(你似乎做得很好)
尝试覆盖查询,例如在所涉及的表上创建具有所有必要列的索引 - 直接作为索引列,或作为包含列(SQL Server 2008及以后版本)
可能会添加额外的索引以加快范围查询,和/或帮助排序/排序
查看您的查询和表定义:
我似乎没有看到任何主键 - 添加这些主键!
您必须确保pp.production_line
上有外键索引(假设t.production_line
是另一个表的主键)
您应该看看是否可以找到一个好的索引来处理t.UTC
上的范围查询
您应该检查是否有必要在production_plan2
上创建包含所有列的索引(order_id, pp.prod_start, pp.prod_end
)
您应该检查是否有必要在temperatures2
上创建包含所有列的索引(UTC, temperature_1
)
更新:您可以通过从SSMS工具栏启用该选项来捕获实际执行计划:
或从Query > Include Actual Execution Plan
答案 1 :(得分:1)
要尝试的事情:
CREATE INDEX ind_pl
ON temperatures (production_line ASC, UTC);
将提供加入的覆盖索引。
使用非equijoins apply(sql server 2005+)可能会更快:
SELECT pp.order_id, -- not related to the problem
pp.prod_start, -- UTC of the start of production
pp.prod_end, -- UTC of the end of production
t.UTC, -- UTC of the temperature measurement
t.temperature_1 -- the measured temperature
INTO result_table02
FROM production_plan AS pp
CROSS APPLY
(
SELECT t1.utc, t1.temperature_1
FROM temperatures AS t1
WHERE t1.production_line = pp.production_line
AND t1.UTC BETWEEN DATEADD(second, @offset_start, pp.prod_start)
AND DATEADD(second, @offset_end, pp.prod_end)
) t
ORDER BY t.UTC;
如果这不成功,下一个选项是编写存储过程,它将确保每个表只读一次,通过声明两个游标,一个用于pp,一个用于t,并一次推进一个方向将匹配插入临时表时。这种技术可能非常复杂,因为存在n:m关系。但是,如果上述内容对您不起作用,我将很乐意为您提供帮助。
答案 2 :(得分:1)
我使用临时表尝试了以下解决方案:
-- UTC range expanded by the offsets -- temporary table used.
-- (Much better -- less than one second.)
DECLARE @offset_start INT;
SET @offset_start = -60 -- one minute = one sample before
DECLARE @offset_end INT;
SET @offset_end = +60 -- one minute = one sample after
-- Temporary table with the production_plan UTC range expanded.
SELECT production_line,
order_id,
prod_start,
prod_end,
DATEADD(second, @offset_start, prod_start) AS start,
DATEADD(second, @offset_end, prod_end) AS bend
INTO #pp
FROM production_plan;
CREATE INDEX ind_UTC
ON #pp (production_line ASC, start ASC, bend ASC);
SELECT order_id,
prod_start,
prod_end,
UTC,
temperature_1
INTO result_table06
FROM #pp JOIN temperatures AS t
ON #pp.production_line = t.production_line
AND UTC BETWEEN #pp.start AND #pp.bend
ORDER BY UTC;
DROP TABLE #pp;
CREATE CLUSTERED INDEX ind_UTC
ON result_table06 (UTC ASC);
结果在不到一秒的时间内准备就绪(比较9分钟)。但我想听听你的批评。一个问题是,如果温度表增长到一个大表,效率会有多高。
答案 3 :(得分:1)
计算列可以帮助您 http://msdn.microsoft.com/en-us/library/ms189292%28v=sql.105%29.aspx
ALTER TABLE production_plan ADD
offset_start int NOT NULL CONSTRAINT DF__production_plan__offset_start DEFAULT 0,
offset_end int NOT NULL CONSTRAINT DF__production_plan__offset_end DEFAULT 0,
prod_start_UTC as CAST(DATEADD(second,offset_start,prod_start) as DATETIME) PERSISTED NOT NULL ,
prod_end_UTC as CAST(DATEADD(second,offset_end,prod_end) as DATETIME) PERSISTED NOT NULL
-- or just
--ALTER TABLE production_plan ADD
-- prod_start_UTC as CAST(DATEADD(second,-60,prod_start) as DATETIME) PERSISTED NOT NULL ,
-- prod_end_UTC as CAST(DATEADD(second,60,prod_end) as DATETIME) PERSISTED NOT NULL
IF EXISTS (SELECT * FROM sys.indexes WHERE object_id = OBJECT_ID(N'[dbo].[temperatures]') AND name = N'ind_pl')
DROP INDEX [ind_pl] ON [dbo].[temperatures] WITH ( ONLINE = OFF )
CREATE INDEX ind_times_UTC
ON production_plan (production_line ASC, prod_start_UTC ASC, prod_end_UTC ASC);
SELECT pp.order_id, -- not related to the problem
pp.prod_start, -- UTC of the start of production
pp.prod_end, -- UTC of the end of production
t.UTC, -- UTC of the temperature measurement
t.temperature_1 -- the measured temperature
INTO result_table05
FROM production_plan AS pp
JOIN temperatures AS t
ON pp.production_line = t.production_line
AND t.UTC BETWEEN pp.prod_start_UTC
AND pp.prod_end_UTC
ORDER BY t.UTC;
以及marc_s
提出的建议答案 4 :(得分:0)
这是你的第二个问题。
我没有检查过这个的性能,但是你可以尝试通过用常量float的加法和减法替换它来跳过DATEADD函数。
如果您想添加一秒,可以使用:
select getdate()+1.000/(24.00*60.00)
或者使用常数:
select getdate()+0.000694444
如您所见,添加1(一)将恰好添加1天。 所以这不会是60秒,但在这种情况下也许并不重要?