我有两个数据库。一个是包含修订号和日期的subversion日志,另一个包含修订期间更改的修订号和路径。我的查询找到每月最多承诺的目录。问题是运行需要几分钟。任何人都可以帮我优化这个野兽查询吗?我确信有更好的方法可以做到。
SELECT [Directory]
,[Month]
,COUNT([PathMonth]) OVER (PARTITION BY [PathMonth]) AS [Count] INTO ##temp
FROM
(SELECT [Path]
,[Month]
,[Directory]
,[Directory] + [Month] AS [PathMonth]
FROM
(SELECT [Path]
,SUBSTRING([Path], 0, LEN([Path]) - CHARINDEX('/', REVERSE([Path])) + 1) AS [Directory]
,CONVERT(CHAR(4), [LogDate], 120) + '-' + CONVERT(CHAR(2), [LogDate], 110) AS [Month]
FROM [SubversionLog] JOIN [PathsLog] ON [SubversionLog].[Revision] = [PathsLog].[Revision]
WHERE [Path] LIKE '/%/%/%/_%'
) one) two
ORDER BY [Month]
SELECT * INTO ##tempTwo
FROM ##temp
GROUP BY [Directory], [Month], [Count]
SELECT t1.[Directory]
,t1.[Month]
,t1.[Count]
FROM ##tempTwo t1 LEFT JOIN ##tempTwo t2 ON t1.[Month] = t2.[Month] AND t1.[Count] < t2.[Count]
WHERE t2.[Count] IS NULL
GROUP BY t1.[Directory], t1.[Month], t1.[Count]
ORDER BY [Month] DESC
IF EXISTS (SELECT * FROM ##temp)
DROP TABLE ##temp
IF EXISTS (SELECT * FROM ##tempTwo)
DROP TABLE ##tempTwo
这项工作的一半是将YYYY-MM-DD HH:MM:SS.SSS时间戳格式化为YYYY-MM并将文件名路径转换为目录。
答案 0 :(得分:1)
由于这是你的版本控制,我猜你表中可能没有超过几十万行。通过两个表格都不会是世界末日,但像Peter Schott所说,这将是一个很好的索引修订:
CREATE nonclustered index <name> on SubversionLog (revision)
CREATE nonclustered index <name> on PathsLog (revision)
从个人经验来看,我不认为做一些字符串操作会严重减慢你的速度。我认为创建多个临时表会降低您的速度,因为您正在为原始表中的大多数行创建新行,并且您的临时表未编制索引。因此,我建议删除这些临时表,并简化您的查询:
;with filteredData as (
SELECT [path],
Substring([path], 0, Len([path]) - Charindex('/',
Reverse([path])) + 1)
AS
[Directory],
CONVERT(CHAR(4), [logdate], 120) + '-'
+ CONVERT(CHAR(2), [logdate], 110)
AS [Month]
FROM [subversionlog]
JOIN [pathslog]
ON [subversionlog].[revision] = [pathslog].[revision]
WHERE [path] LIKE '/%/%/%/_%'
), countRevisions as (
SELECT [month],
[directory],
count(*) as [Count]
FROM filteredData
GROUP BY [MONTH], [directory]
), rankDirectories as (
SELECT *, RANK() over (partition by month order by count desc) as [Rank]
from countRevisions
)
select [month], [directory], [count]
from rankDirectories
WHERE [rank] = 1
编辑:
我认为如果不缓存一些结果,你可以做的更多。您应该查看查询计划以查看需要优化的内容。我猜它是在分组,排序和/或排名,可能加入或密钥查找。对于键查找和连接,您可以创建覆盖索引。对于其他内容,您需要缓存结果。我不会缓存一个真正的表,因为这将意味着一个额外的表,以保持最新。相反,我会使用物化视图,以便SQL Server为您保持更新。当然这意味着更新会更慢,但是每分钟更新一次(对于源代码管理日志),我认为这不是一个大问题。