使用两个(几乎相同)的连接查询速度慢,但是使用任一连接的查询速度快

时间:2018-11-22 03:21:09

标签: sql-server sql-execution-plan sql-optimization

我有一个查询,现在运行非常慢。此查询具有对我们的股票头寸的组合查询(我称其为POSITION_QUERY,在给定日期在一个交易所中有一行用于一个股票代码的交易),然后加入(我称其为FIRST JOIN )的股票价格表以获取价格,则加入条件位于三列中:股票代码,交易所和交易日期。然后我需要一个SECOND JOIN,因为每只股票都属于一个复合指数(在POSITION_QUERY中,每一行都有指示索引代码和索引交易位置的列)。

所以我的查询看起来像这样:

SELECT * FROM 
POSITION_QUERY t1 
 JOIN DAILY_PRICE t2
    on t1.STOCK_CODE = t2.STOCK_CODE
       and t1.STOCK_EXCHANGE = t2.EXCHANGE
       and t2.TRADE_DATE = 20181121
 JOIN DAILY_PRICE t3
    on t1.INDEX_CODE = t3.STOCK_CODE
       and t1.INDEX_EXCHANGE = t3.EXCHANGE
       and t3.TRADE_DATE = 20181121

现在查询真的很慢:大约需要3分钟才能返回50行结果。正如我提到的,POSITION_QUERY实际上是一个查询,而不是现有的表。但是,如果我运行SELECT * FROM POSITION_QUERY还是很快的(我只在POSITION_QUERY内获得20181121的排名,所以这个查询的数量已经是50,正如我之前提到的那样)。 DAILY_PRICEview,但几乎映射到一个现有表,并且该表的联接列上都有索引。

对我来说奇怪的是,如果我仅执行POSITION_QUERYPOSITION_QUERYFIRST JOIN(即,将DAILY_PRICE与第一组条件一起加入),或POSITION_QUERYSECOND JOIN(将DAILY_PRICE与第二组条件结合在一起),所有三个查询的运行速度都非常快(不到一秒钟)。

我检查了实际的执行计划,两个联接的计划和一个联接的计划非常相似,但是在两个联接计划中,有一个table spool (lazy spool),其成本为49%。表假脱机操作符的输出列表是POSOTION_QUERY,所以我猜它正在存储“ POSITION_QUERY”结果(但是为什么它不是连续联接?)。我很难解释执行计划,所以我不知道这是否是问题以及如何解决。

更新: 我已经粘贴了执行计划以及真实的数据表结构和查询。链接为:Execution plan

3 个答案:

答案 0 :(得分:1)

尝试一下:

WITH DAILY_PRICE_TODAY (STOCK_CODE, EXCHANGE)  
AS  
-- Define the CTE query.  
(  
   SELECT STOCK_CODE, EXCHANGE 

   FROM DAILY_PRICE

   WHERE TRADE_DATE = 20181121
)  

SELECT * FROM 
POSITION_QUERY t1 
 JOIN DAILY_PRICE_TODAY t2
    on t1.STOCK_CODE = t2.STOCK_CODE
       and t1.STOCK_EXCHANGE = t2.EXCHANGE

 JOIN DAILY_PRICE_TODAY t3
    on t1.INDEX_CODE = t3.STOCK_CODE
       and t1.INDEX_EXCHANGE = t3.EXCHANGE

答案 1 :(得分:1)

数据类型是什么?在生成520,000行具有隐式数据类型的样本数据后,只需3秒钟即可运行查询:

CREATE TABLE POSITION_QUERY (STOCK_CODE INT, STOCK_EXCHANGE INT, INDEX_CODE INT, INDEX_EXCHANGE INT, TRADE_DATE INT)
CREATE TABLE DAILY_PRICE (STOCK_CODE INT, EXCHANGE INT, TRADE_DATE INT)

-- Put 520,000 rows of sample data in POSITION_QUERY.
;WITH CTE AS (
    SELECT 1 AS A
    UNION ALL
    SELECT A + 1
    FROM CTE
    WHERE A < 10
),
CTE_DATE AS (
    SELECT CAST(GETDATE() AS DATE) AS D
    UNION ALL
    SELECT DATEADD(DAY, -1, D)
    FROM CTE_DATE
    WHERE D > '10/1/2018'
)
INSERT INTO POSITION_QUERY
SELECT C1.A, C2.A, C3.A, C4.A, FORMAT(C5.D, 'yyyyMMdd')
FROM CTE C1, CTE C2, CTE C3, CTE C4, CTE_DATE C5
OPTION (MAXRECURSION 0)

-- Put 5,200 rows of sample data in DAILY_PRICE that match all POSITION_QUERY records
;WITH CTE AS (
    SELECT 1 AS A
    UNION ALL
    SELECT A + 1
    FROM CTE
    WHERE A < 10
),
CTE_DATE AS (
    SELECT CAST(GETDATE() AS DATE) AS D
    UNION ALL
    SELECT DATEADD(DAY, -1, D)
    FROM CTE_DATE
    WHERE D > '10/1/2018'
)
INSERT INTO DAILY_PRICE 
SELECT C1.A, C2.A, FORMAT(C3.D, 'yyyyMMdd')
FROM CTE C1, CTE C2, CTE_DATE C3
OPTION (MAXRECURSION 0)

-- Create nonclustered indexes on both tables' pertinent columns.
CREATE NONCLUSTERED INDEX IDX_POSITION_QUERY
ON [dbo].[POSITION_QUERY] ([STOCK_CODE],[STOCK_EXCHANGE])
INCLUDE ([INDEX_CODE],[INDEX_EXCHANGE],[TRADE_DATE])
GO

CREATE NONCLUSTERED INDEX IDX_DAILY_PRICE
ON DAILY_PRICE (STOCK_CODE, EXCHANGE, TRADE_DATE)
GO

-- Finally, run the query. It takes 3 seconds to return 520k records.
SELECT * FROM 
POSITION_QUERY t1 
 JOIN DAILY_PRICE t2
    on t1.STOCK_CODE = t2.STOCK_CODE
       and t1.STOCK_EXCHANGE = t2.EXCHANGE
       and t2.TRADE_DATE = 20181121
 JOIN DAILY_PRICE t3
    on t1.INDEX_CODE = t3.STOCK_CODE
       and t1.INDEX_EXCHANGE = t3.EXCHANGE
       and t3.TRADE_DATE = 20181121

这是执行计划:

https://www.brentozar.com/pastetheplan/?id=BkSgin7C7

您可以粘贴执行计划吗?某处可能存在错误的类型转换。即使没有我创建的索引,也只需要14秒钟。

答案 2 :(得分:0)

如果无法自己进行测试,我可以提供一种我喜欢采用的策略,该策略通常可以加快查询结果的速度。也就是说,将您可以存储的内容存储在临时表中并对其进行精确索引,以满足主查询的需求。在这种情况下,您似乎可以从DAILY_PRICE分离出所需的数据,然后在STOCK_CODEEXCHANGE上建立索引,就像这样:

DROP TABLE IF EXISTS #temp;
SELECT *
INTO #temp
FROM DAILY_PRICE
WHERE TRADE_DATE = 20181121;
CREATE INDEX [IX1] ON #temp(STOCK_CODE, EXCHANGE);

SELECT *
FROM POSITION_QUERY t1 
 JOIN #temp t2
    on t1.STOCK_CODE = t2.STOCK_CODE
       and t1.STOCK_EXCHANGE = t2.EXCHANGE
 JOIN #temp t3
    on t1.INDEX_CODE = t3.STOCK_CODE
       and t1.INDEX_EXCHANGE = t3.EXCHANGE

可能会导致更快的结果,因为它给执行计划者带来了其他选择,只能使用您提供的内容,而不是尝试使用主体表,这有时可能导致昂贵的操作例如假脱机,散列或并行化。