有关如何提高此查询性能的任何想法?
[ftsIndex] PK是sID,wordPos wordID,sID,wordPos上有一个索引 他们都是int。
最后使用不同的。
大多数sID只有几场比赛
某些sID可能有超过10,000个匹配并终止查询。
查询前27,749行在11秒内返回的内容 没有一个sID有超过500场比赛 个人比赛的总和是65,615。
单独第27,750行需要2分钟,并且有15,000场比赛。
最后的联接在[sID]上,这并不奇怪。
因为在最后使用distinct是有一种方法来寻找第一个 肯定的
on [wXright].[sID] = [wXleft].[sID]
and [wXright].[wordPos] > [wXleft].[wordPos]
and [wXright].[wordPos] <= [wXleft].[wordPos] + 10
然后转到下一个sID?
我知道这对查询优化器提出了很多要求,但这真的很酷。
在现实生活中,问题文件是零件清单,供应商重复多次。
select distinct [wXleft].[sID]
FROM
( -- begin [wXleft]
( -- start term
select [ftsIndex].[sID], [ftsIndex].[wordPos]
from [ftsIndex] with (nolock)
where [ftsIndex].[wordID] in
(select [id] from [FTSwordDef] with (nolock)
where [word] like 'Brown')
) -- end term
) [wXleft]
join
( -- begin [wRight]
( -- start term
select [ftsIndex].[sID], [ftsIndex].[wordPos]
from [ftsIndex] with (nolock)
where [ftsIndex].[wordID] in
(select [id] from [FTSwordDef] with (nolock)
where [word] like 'Fox')
) -- end term
) [wXright]
on [wXright].[sID] = [wXleft].[sID]
and [wXright].[wordPos] > [wXleft].[wordPos]
and [wXright].[wordPos] <= [wXleft].[wordPos] + 10
这将其降至1:40
inner loop join
我这样做是为了尝试,它完全改变了查询计划 我不知道问题查询需要多长时间。我在20:00放弃了 我甚至不打算将此作为答案发布,因为我认为这对任何其他人都没有价值 希望得到更好的答案 如果我在接下来的两天内没有得到一个,我会删除这个问题。
这不能解决问题
select distinct [ft1].[sID]
from [ftsIndex] as [ft1] with (nolock)
join [ftsIndex] as [ft2] with (nolock)
on [ft2].[sID] = [ft1].[sID]
and [ft1].[wordID] in (select [id] from [FTSwordDef] with (nolock) where [word] like 'brown')
and [ft2].[wordID] in (select [id] from [FTSwordDef] with (nolock) where [word] like 'fox')
and [ft2].[wordPos] > [ft1].[wordPos]
and [ft2].[wordPos] <= [ft1].[wordPos] + 10
还支持“快速褐色”等10个单词“fox”或“coyote”等查询,因此加入别名不是一条好路。
这需要14分钟(但至少会运行) 同样,这种格式不利于更高级的查询。
IF OBJECT_ID(N'tempdb..#tempMatch1', N'U') IS NOT NULL DROP TABLE #tempMatch1
CREATE TABLE #tempMatch1(
[sID] [int] NOT NULL,
[wordPos] [int] NOT NULL,
CONSTRAINT [PK1] PRIMARY KEY CLUSTERED
(
[sID] ASC,
[wordPos] ASC
))
IF OBJECT_ID(N'tempdb..#tempMatch2', N'U') IS NOT NULL DROP TABLE #tempMatch2
CREATE TABLE #tempMatch2(
[sID] [int] NOT NULL,
[wordPos] [int] NOT NULL,
CONSTRAINT [PK2] PRIMARY KEY CLUSTERED
(
[sID] ASC,
[wordPos] ASC
))
insert into #tempMatch1
select [ftsIndex].[sID], [ftsIndex].[wordPos]
from [ftsIndex] with (nolock)
where [ftsIndex].[wordID] in
(select [id] from [FTSwordDef] with (nolock)
where [word] like 'Brown')
--and [wordPos] < 100000;
order by [ftsIndex].[sID], [ftsIndex].[wordPos]
insert into #tempMatch2
select [ftsIndex].[sID], [ftsIndex].[wordPos]
from [ftsIndex] with (nolock)
where [ftsIndex].[wordID] in
(select [id] from [FTSwordDef] with (nolock)
where [word] like 'Fox')
--and [wordPos] < 100000;
order by [ftsIndex].[sID], [ftsIndex].[wordPos]
select count(distinct(#tempMatch1.[sID]))
from #tempMatch1
join #tempMatch2
on #tempMatch2.[sID] = #tempMatch1.[sID]
and #tempMatch2.[wordPos] > #tempMatch1.[wordPos]
and #tempMatch2.[wordPos] <= #tempMatch1.[wordPos] + 10
稍微不同的联接在5秒内运行(并且具有不同的查询计划) 但是我无法用提示来修复它,因为它会在一次加入的地方移动 甚至+1也有超过10个文件,超过7,000个匹配。
on [wXright].[sID] = [wXleft].[sID]
and [wXright].[wordPos] = [wXleft].[wordPos] + 1
全桌def
CREATE TABLE [dbo].[FTSindex](
[sID] [int] NOT NULL,
[wordPos] [int] NOT NULL,
[wordID] [int] NOT NULL,
[charPos] [int] NOT NULL,
CONSTRAINT [PK_FTSindex] PRIMARY KEY CLUSTERED
(
[sID] ASC,
[wordPos] ASC
)WITH (PAD_INDEX = ON, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, FILLFACTOR = 100) ON [PRIMARY]
) ON [PRIMARY]
GO
ALTER TABLE [dbo].[FTSindex] WITH CHECK ADD CONSTRAINT [FK_FTSindex_FTSwordDef] FOREIGN KEY([wordID])
REFERENCES [dbo].[FTSwordDef] ([ID])
GO
ALTER TABLE [dbo].[FTSindex] CHECK CONSTRAINT [FK_FTSindex_FTSwordDef]
GO
答案 0 :(得分:1)
<强>更新强>
如果延迟过滤“L”和“R”,直到过程的最后一部分,你仍然可以使用union all
来帮助优化器保留索引的排序。不幸的是,您需要事先检索所有wordid并在equals
条件下使用它们。在我的机器上,它将执行时间减少到2/3:
; with o as (
select sID, wordPos, wordID
from FTSindex
where wordID = 1
union all
select sID, wordPos, wordID
from FTSindex
where wordID = 4
union all
select sID, wordPos, wordID
from FTSindex
where wordID = 2
),
g as (
select sID, wordPos, wordID,
ROW_NUMBER() over (partition by [sID] order by wordPos) rn
from o
)
select count(distinct(g1.sID)) -- 26919 00:02
from g g1
join g g2
on g1.sID = g2.sID
and g1.rn = g2.rn - 1
and g1.wordPos >= g2.wordPos - 10
-- Now is the time to repartition the stream
and g1.wordID in (1, 4)
and g2.wordID = 2
哦,现在真的需要两秒钟吗?
更新 - 2:
; with o as (
-- Union all resolves costly sort
select sid, wordpos, wordid
from FTSindex
where wordID = 1
union all
select sid, wordpos, wordID
from FTSindex
where wordID = 2
),
g as (
select sid, wordid, wordpos,
ROW_NUMBER() over(order by sid, wordpos) rn
from o
)
select count(distinct g1.sid)
from g g1
inner join g g2
on g1.sID = g2.sID
and g1.rn = g2.rn - 1
where g1.wordID = 1
and g2.wordID = 2
and g1.wordPos >= g2.wordpos - 10
1和2表示选定的单词'ID。对于10个单词内的多次命中,结果与原始查询产生的结果不同;原始查询将报告所有这些,但这一个将仅显示最接近的一个。
这个想法是只提取搜索到的单词并比较两个相邻单词之间的距离,其中wordID 1首先出现,wordID 2出现。
更新 - 1 :
我删掉了这篇文章,因为它没有我想象的那么好。但是,它比优化查询更能满足OP的需求,因为它允许同时搜索多个单词(在where子句中可能指定的另一个单词附近找到的单词列表)。
; with g as (
select sid, wordid, wordpos,
ROW_NUMBER() over(order by sid, wordpos) rn
from FTSindex
where wordID in (1, 2)
)
select count(distinct g1.sid)
from g g1
inner join g g2
on g1.sID = g2.sID
and g1.rn = g2.rn - 1
where g1.wordID = 1
and g2.wordID = 2
and g1.wordPos >= g2.wordpos - 10
第一次尝试:
可能有一种方法可以将cross apply与top 1
结合使用。
select [wXleft].[sID], [wXleft].[wordPos]
from [ftsIndex] wXleft with (nolock)
cross apply
(
select top 1 r.sID
from [ftsIndex] r
where r.sID = wXleft.sID
and r.wordPos > wxLeft.wordPos
and r.wordPos <= wxLeft.wordPos + 10
and r.wordID in
(select [id]
from [FTSwordDef] with (nolock)
where [word] like 'Fox')
) wXright
where [wXleft].[wordID] in
(select [id]
from [FTSwordDef] with (nolock)
where [word] like 'Brown')
BONUS PIVOT ATTEMPT:
; with o as (
select sid, wordpos, wordid
from FTSindex
where wordID = 1
union all
select sid, wordpos, wordID
from FTSindex
where wordID = 2
),
g as (
select sid, wordid, wordpos,
ROW_NUMBER() over(order by sid, wordpos) rn
from o
)
select sid, rn, [1], [2]
from
(
-- Collapse rns belonging to wordid 2 to ones belonging to wordid 1
-- so they appear in the same row
select sid, wordpos, wordid, rn - case when wordid = 1 then 0 else 1 end rn
from g
) g1
pivot (max(wordpos) for wordid in ([1], [2])) u
where [2] - [1] <= 10
答案 1 :(得分:1)
好吧,我希望我有更多的信息或方法来测试,但如果失败了,这就是我可能会尝试的:
IF OBJECT_ID(N'tempdb..#tempMatch', N'U') IS NOT NULL DROP TABLE #tempMatch
CREATE TABLE #tempMatch(
[sID] [int] NOT NULL,
[wordPos] [int] NOT NULL,
[wordID] [int] NOT NULL,
CONSTRAINT [PK2] PRIMARY KEY CLUSTERED
(
[sID] ASC,
[wordPos] ASC
))
--
;WITH cteWords As
(
SELECT 'Brown' as [word]
UNION ALL SELECT 'Fox'
)
INSERT INTO #tempMatch ([sID],[wordPos],[wordID])
SELECT sID, wordPos, wordID
FROM ftsIndex
WHERE EXISTS
(Select * From FTSWordDef s1
inner join cteWords s2 ON s1.word = s2.word
Where ftsIndex.wordID = s1.id)
;
select count(distinct(s1.[sID]))
from #tempMatch s1
join #tempMatch s2
on s2.[sID] = s1.[sID]
and s2.[wordPos] > s1.[wordPos]
and s2.[wordPos] <= s1.[wordPos] + 10
where s1.wordID = (select id from FTSWordDef w where w.word = 'Brown')
and s2.wordID = (select id from FTSWordDef w where w.word = 'Fox')
我昨晚想出了一个替代版本。它与上面的查询相同,但CREATE语句更改为:
IF OBJECT_ID(N'tempdb..#tempMatch', N'U') IS NOT NULL DROP TABLE #tempMatch
CREATE TABLE #tempMatch(
[sID] [int] NOT NULL,
[wordID] [int] NOT NULL,
[wordPos] [int] NOT NULL,
CONSTRAINT [PK0] PRIMARY KEY CLUSTERED
(
[wordID] ASC,
[sID] ASC,
[wordPos] ASC
))
如果这些有帮助,请告诉我。