我尝试了一切,但我无法解决这个问题。
我有一个表值函数。
当我用
调用此函数时SELECT * FROM Ratings o1
CROSS APPLY dbo.FN_RatingSimilarity(50, 497664, 'Cosine') o2
WHERE o1.trackId = 497664
执行需要一段时间。但是当我这样做的时候。
SELECT * FROM Ratings o1
CROSS APPLY dbo.FN_RatingSimilarity(50, o1.trackId, 'Cosine') o2
WHERE o1.trackId = 497664
它在32秒内执行。我创建了所有索引,但它没有帮助。
我的功能顺便说一下:
ALTER FUNCTION [dbo].[FN_RatingSimilarity]
(
@trackId INT,
@nTrackId INT,
@measureType VARCHAR(100)
)
RETURNS TABLE
WITH SCHEMABINDING
AS
RETURN
(
SELECT o2.id,
o2.name,
o2.releaseDate,
o2.numberOfRatings,
o2.averageRating,
COUNT(1) as numberOfSharedUsers,
CASE @measureType
WHEN 'Cosine' THEN SUM(o3.score*o4.score)/(0.01+SQRT(SUM(POWER(o3.score,2))) * SQRT(SUM(POWER(o4.score,2))))
WHEN 'AdjustedCosine' THEN SUM((o3.score-o5.averageRating)*(o4.score-o5.averageRating))/(0.01+SQRT(SUM(POWER(o3.score-o5.averageRating, 2)))*SQRT(SUM(POWER(o4.score-o5.averageRating, 2))))
WHEN 'Pearson' THEN SUM((o3.score-o1.averageRating)*(o4.score-o2.averageRating))/(0.01+SQRT(SUM(POWER(o3.score-o1.averageRating, 2)))*SQRT(SUM(POWER(o4.score-o2.averageRating, 2))))
END as similarityRatio
FROM dbo.Tracks o1
INNER JOIN dbo.Tracks o2 ON o2.id != @trackId
INNER JOIN dbo.Ratings o3 ON o3.trackId = o1.id
INNER JOIN dbo.Ratings o4 ON o4.trackId = o2.id AND o4.userId = o3.userId
INNER JOIN dbo.Users o5 ON o5.id = o4.userId
WHERE o1.id = @trackId
AND o2.id = ISNULL(@nTrackId, o2.id)
GROUP BY o2.id,
o2.name,
o2.releaseDate,
o2.numberOfRatings,
o2.averageRating
)
任何帮助将不胜感激。
感谢。 埃默拉尔
答案 0 :(得分:1)
我相信你的瓶颈是计算+你非常昂贵的内部联接。
您加入的方式基本上是创建交叉连接 - 它返回一个结果集,其中包含链接到所有其他记录的所有记录,除了提供id的记录。然后,您可以使用其他内部联接添加到该结果集。
对于每个内连接,SQL都会创建一个包含所有匹配行的结果集。 所以你在查询中做的第一件事就是告诉SQL基本上在同一个表上进行交叉连接。 (我假设你仍在关注,看起来非常先进,所以我只是让你熟悉高级SQL语法和运算符)
现在在下一个内部联接中,您将结果表应用于新创建的巨大结果集,然后仅筛选出不是两个表的结果集。
首先,看看你是否不能以相反的方式进行加入。 (这实际上取决于您的表记录计数和记录大小)。尝试先获得最小的结果集,然后再加入其中。
您可能想要尝试的第二件事是首先在连接之前限制结果集。所以从CTE开始,在那里过滤o1.id = @trackId。然后从此CTE中选择*,在CTE上进行连接,然后在查询中过滤o2.id = ISNULL(@nTrackId,o2.id)
我会做一个例子,请继续关注...
- 好的,我添加了一个例子,做了一个快速测试,返回的值是相同的。通过您的数据运行此操作,并告知我们是否有任何改进。 (注意,这并没有解决所讨论的INNER JOIN订单点,仍然可以解决这个问题。)
示例:
ALTER FUNCTION [dbo].[FN_RatingSimilarity_NEW]
(
@trackId INT,
@nTrackId INT,
@measureType VARCHAR(100)
)
RETURNS TABLE
WITH SCHEMABINDING
AS
RETURN
(
WITH CTE_ALL AS
(
SELECT id,
name,
releaseDate,
numberOfRatings,
averageRating
FROM dbo.Tracks
WHERE id = @trackId
)
SELECT o2.id,
o2.name,
o2.releaseDate,
o2.numberOfRatings,
o2.averageRating,
COUNT(1) as numberOfSharedUsers,
CASE @measureType
WHEN 'Cosine' THEN SUM(o3.score*o4.score)/(0.01+SQRT(SUM(POWER(o3.score,2))) * SQRT(SUM(POWER(o4.score,2))))
WHEN 'AdjustedCosine' THEN SUM((o3.score-o5.averageRating)*(o4.score-o5.averageRating))/(0.01+SQRT(SUM(POWER(o3.score-o5.averageRating, 2)))*SQRT(SUM(POWER(o4.score-o5.averageRating, 2))))
WHEN 'Pearson' THEN SUM((o3.score-o1.averageRating)*(o4.score-o2.averageRating))/(0.01+SQRT(SUM(POWER(o3.score-o1.averageRating, 2)))*SQRT(SUM(POWER(o4.score-o2.averageRating, 2))))
END as similarityRatio
FROM CTE_ALL o1
INNER JOIN dbo.Tracks o2 ON o2.id != @trackId
INNER JOIN dbo.Ratings o3 ON o3.trackId = o1.id
INNER JOIN dbo.Ratings o4 ON o4.trackId = o2.id AND o4.userId = o3.userId
INNER JOIN dbo.Users o5 ON o5.id = o4.userId
WHERE o2.id = ISNULL(@nTrackId, o2.id)
GROUP BY o2.id,
o2.name,
o2.releaseDate,
o2.numberOfRatings,
o2.averageRating
)