SQL Server,表值函数处理速度慢

时间:2011-12-19 10:44:00

标签: sql-server performance function user-defined

我尝试了一切,但我无法解决这个问题。

我有一个表值函数。

当我用

调用此函数时
SELECT * FROM Ratings o1 
    CROSS APPLY dbo.FN_RatingSimilarity(50, 497664, 'Cosine') o2 
WHERE o1.trackId = 497664

执行需要一段时间。但是当我这样做的时候。

SELECT * FROM Ratings o1 
    CROSS APPLY dbo.FN_RatingSimilarity(50, o1.trackId, 'Cosine') o2 
WHERE o1.trackId = 497664

它在32秒内执行。我创建了所有索引,但它没有帮助。

我的功能顺便说一下:

ALTER FUNCTION [dbo].[FN_RatingSimilarity]
(   
    @trackId    INT,
    @nTrackId   INT,
    @measureType    VARCHAR(100)
)
RETURNS TABLE 
WITH SCHEMABINDING
AS
    RETURN
    (
        SELECT o2.id,
               o2.name,
               o2.releaseDate,
               o2.numberOfRatings,
               o2.averageRating,
               COUNT(1) as numberOfSharedUsers,
          CASE @measureType 
               WHEN 'Cosine' THEN SUM(o3.score*o4.score)/(0.01+SQRT(SUM(POWER(o3.score,2))) * SQRT(SUM(POWER(o4.score,2)))) 
               WHEN 'AdjustedCosine' THEN SUM((o3.score-o5.averageRating)*(o4.score-o5.averageRating))/(0.01+SQRT(SUM(POWER(o3.score-o5.averageRating, 2)))*SQRT(SUM(POWER(o4.score-o5.averageRating, 2)))) 
               WHEN 'Pearson' THEN SUM((o3.score-o1.averageRating)*(o4.score-o2.averageRating))/(0.01+SQRT(SUM(POWER(o3.score-o1.averageRating, 2)))*SQRT(SUM(POWER(o4.score-o2.averageRating, 2)))) 
           END as similarityRatio
          FROM dbo.Tracks o1
    INNER JOIN dbo.Tracks o2 ON o2.id != @trackId 
    INNER JOIN dbo.Ratings o3 ON o3.trackId = o1.id 
    INNER JOIN dbo.Ratings o4 ON o4.trackId = o2.id AND o4.userId = o3.userId
    INNER JOIN dbo.Users o5 ON o5.id = o4.userId 
         WHERE o1.id = @trackId 
             AND o2.id = ISNULL(@nTrackId, o2.id)
      GROUP BY o2.id, 
               o2.name, 
               o2.releaseDate,
               o2.numberOfRatings, 
               o2.averageRating
    )

任何帮助将不胜感激。

感谢。 埃默拉尔

1 个答案:

答案 0 :(得分:1)

我相信你的瓶颈是计算+你非常昂贵的内部联接。

您加入的方式基本上是创建交叉连接 - 它返回一个结果集,其中包含链接到所有其他记录的所有记录,除了提供id的记录。然后,您可以使用其他内部联接添加到该结果集。

对于每个内连接,SQL都会创建一个包含所有匹配行的结果集。 所以你在查询中做的第一件事就是告诉SQL基本上在同一个表上进行交叉连接。 (我假设你仍在关注,看起来非常先进,所以我只是让你熟悉高级SQL语法和运算符)

现在在下一个内部联接中,您将结果表应用于新创建的巨大结果集,然后仅筛选出不是两个表的结果集。

首先,看看你是否不能以相反的方式进行加入。 (这实际上取决于您的表记录计数和记录大小)。尝试先获得最小的结果集,然后再加入其中。

您可能想要尝试的第二件事是首先在连接之前限制结果集。所以从CTE开始,在那里过滤o1.id = @trackId。然后从此CTE中选择*,在CTE上进行连接,然后在查询中过滤o2.id = ISNULL(@nTrackId,o2.id)

我会做一个例子,请继续关注...

- 好的,我添加了一个例子,做了一个快速测试,返回的值是相同的。通过您的数据运行此操作,并告知我们是否有任何改进。 (注意,这并没有解决所讨论的INNER JOIN订单点,仍然可以解决这个问题。)

示例:

ALTER FUNCTION [dbo].[FN_RatingSimilarity_NEW] 
(    
    @trackId    INT, 
    @nTrackId   INT, 
    @measureType    VARCHAR(100) 
) 
RETURNS TABLE  
WITH SCHEMABINDING 
AS 
    RETURN 
    ( 
        WITH CTE_ALL AS 
        (
            SELECT id, 
               name, 
               releaseDate, 
               numberOfRatings, 
               averageRating
            FROM dbo.Tracks
            WHERE  id = @trackId  
        )
        SELECT o2.id, 
               o2.name, 
               o2.releaseDate, 
               o2.numberOfRatings, 
               o2.averageRating, 
               COUNT(1) as numberOfSharedUsers, 
          CASE @measureType  
               WHEN 'Cosine' THEN SUM(o3.score*o4.score)/(0.01+SQRT(SUM(POWER(o3.score,2))) * SQRT(SUM(POWER(o4.score,2))))  
               WHEN 'AdjustedCosine' THEN SUM((o3.score-o5.averageRating)*(o4.score-o5.averageRating))/(0.01+SQRT(SUM(POWER(o3.score-o5.averageRating, 2)))*SQRT(SUM(POWER(o4.score-o5.averageRating, 2))))  
               WHEN 'Pearson' THEN SUM((o3.score-o1.averageRating)*(o4.score-o2.averageRating))/(0.01+SQRT(SUM(POWER(o3.score-o1.averageRating, 2)))*SQRT(SUM(POWER(o4.score-o2.averageRating, 2))))  
           END as similarityRatio 
          FROM CTE_ALL o1 
    INNER JOIN dbo.Tracks o2 ON o2.id != @trackId  
    INNER JOIN dbo.Ratings o3 ON o3.trackId = o1.id  
    INNER JOIN dbo.Ratings o4 ON o4.trackId = o2.id AND o4.userId = o3.userId 
    INNER JOIN dbo.Users o5 ON o5.id = o4.userId  
         WHERE o2.id = ISNULL(@nTrackId, o2.id) 
      GROUP BY o2.id,  
               o2.name,  
               o2.releaseDate, 
               o2.numberOfRatings,  
               o2.averageRating 
    )