带有游标优化的SQL查询

时间:2016-03-14 14:20:21

标签: sql sql-server tsql

我有一个查询,我遍历一个表 - >对于每个条目,我遍历另一个表,然后计算一些结果。我使用游标迭代表。此查询需要很长时间才能完成。总是超过3分钟。如果我在C#中做类似的事情,其中​​表是数组或字典,它甚至不需要一秒钟。我做错了什么,如何提高效率?

DELETE FROM [QueryScores]
GO

INSERT INTO [QueryScores] (Id)
SELECT Id FROM [Documents]

DECLARE @Id NVARCHAR(50)

DECLARE myCursor CURSOR LOCAL FAST_FORWARD FOR
SELECT [Id] FROM [QueryScores]

OPEN myCursor

FETCH NEXT FROM myCursor INTO @Id

WHILE @@FETCH_STATUS = 0
    BEGIN
        DECLARE @Score FLOAT = 0.0

        DECLARE @CounterMax INT = (SELECT COUNT(*) FROM [Query])
        DECLARE @Counter INT = 0

        PRINT 'Document: ' + CAST(@Id AS VARCHAR)
        PRINT 'Score: ' + CAST(@Score AS VARCHAR)

        WHILE @Counter < @CounterMax
            BEGIN

            DECLARE @StemId INT = (SELECT [Query].[StemId] FROM [Query] WHERE [Query].[Id] = @Counter)

            DECLARE @Weight FLOAT = (SELECT [tfidf].[Weight] FROM [TfidfWeights] AS [tfidf] WHERE [tfidf].[StemId] = @StemId AND [tfidf].[DocumentId] = @Id)

            PRINT 'WEIGHT: ' + CAST(@Weight AS VARCHAR)

            IF(@Weight > 0.0)
                BEGIN
                DECLARE @QWeight FLOAT = (SELECT [Query].[Weight] FROM [Query] WHERE [Query].[StemId] = @StemId)
                SET @Score = @Score + (@QWeight * @Weight)
                PRINT 'Score: ' + CAST(@Score AS VARCHAR)
                END

            SET @Counter = @Counter + 1
            END 

        UPDATE [QueryScores] SET Score = @Score WHERE Id = @Id 

        FETCH NEXT FROM myCursor INTO @Id
    END

CLOSE myCursor
DEALLOCATE myCursor 

逻辑是我有一份文档列表。我有一个问题/疑问。我遍历每个文档,然后通过查询术语/单词进行嵌套迭代,以查找文档是否包含这些术语。如果是,那么我添加/乘以预先计算的分数。

3 个答案:

答案 0 :(得分:7)

问题在于您尝试使用基于集合的语言来迭代过程语言之类的东西。 SQL需要不同的思维方式。你应该几乎从不考虑SQL中的循环。

从我可以从你的代码中收集到的内容,这应该做你在所有这些循环中尝试做的事情,但是它是在基于集合的方式的单个语句中完成的,这就是SQL擅长。

INSERT INTO QueryScores (id, score)
SELECT
    D.id,
    SUM(CASE WHEN W.[Weight] > 0 THEN W.[Weight] * Q.[Weight] ELSE NULL END)
FROM
    Documents D
CROSS JOIN Query Q
LEFT OUTER JOIN TfidfWeights W ON W.StemId = Q.StemId AND W.DocumentId = D.id
GROUP BY
    D.id

当然,如果没有您的要求描述或具有预期输出的样本数据,我不知道这实际上是否是您想要获得的,但这是我最好的猜测,因为您的代码。

您应该阅读:https://stackoverflow.com/help/how-to-ask

答案 1 :(得分:1)

我提出的查询与Tom H的查询非常相似。

OP代码试图解决的问题有很多未知数。是否有一个特殊原因,代码只检查Query表中Id值介于0和1之间的行在表中的行数?或者意图真的只是为了获取Query的所有行?

这是我的版本:

INSERT INTO QueryScores (Id, Score)
SELECT d.Id
     , SUM(CASE WHEN w.Weight > 0 THEN w.Weight * q.Weight ELSE NULL END) AS Score
  FROM [Documents] d
 CROSS
  JOIN [Query] q
  LEFT
  JOIN [TfidfWeights] w
    ON w.StemId = q.StemId
   AND w.DocumentId = d.Id
 GROUP BY d.Id

处理RBAR(通过痛苦的行排)几乎总是比作为一组处理慢。 SQL旨在对数据集进行操作。每个单独的SQL语句以及过程和SQL引擎之间的每个上下文切换都有开销。当然,可能有提高程序各个部分性能的空间,但是在单个SQL语句中,最大的好处就是对整个集合进行操作。

如果出于某种原因你需要一次处理一个文档,使用游标,然后摆脱循环和个别选择以及所有那些PRINT,只需使用一个查询来获得分数文件。

OPEN myCursor
FETCH NEXT FROM myCursor INTO @Id
WHILE @@FETCH_STATUS = 0
  BEGIN
    UPDATE [QueryScores] 
       SET Score 
         = (  SELECT SUM( CASE WHEN w.Weight > 0 
                               THEN w.Weight * q.Weight 
                               ELSE NULL END
                     )
                FROM [Query] q
                JOIN [TfidfWeights] w
                  ON w.StemId = q.StemId
               WHERE w.DocumentId = @Id
           )
     WHERE Id = @Id

    FETCH NEXT FROM myCursor INTO @Id

  END
CLOSE myCursor
DEALLOCATE myCursor

答案 2 :(得分:1)

您甚至可能不需要文件

INSERT INTO QueryScores (id, score)
SELECT W.DocumentId as [id]
     , SUM(W.[Weight] + Q.[Weight]) as [score]  
  FROM Query Q
  JOIN TfidfWeights W 
         ON W.StemId = Q.StemId 
        AND W.[Weight] > 0 
 GROUP BY W.DocumentId