我在SQL Server 2008 R2中有三个表:
PRODUCTS (id int, title varchar(100), ....)
WORDS (id int,word varchar(100) )
WORDS_IN_TITLES (product_id int, word_id int)
现在我想选择标题中使用了某些单词的所有产品。
现在我这样做:
declare @words tp_intList
insert into @words values(154)
insert into @words values(172)
declare @wordsCnt int = (select count(*) from @words)
select * from products where id IN
(
select product_id from WORDS_IN_TITLES inner join
(select id from @words) wrds ON wrds.id=WORDS_IN_TITLES.word_id
group by product_id HAVING count(*)=@wordsCnt
)
它有效,但速度很慢。表包含600k行,返回3.5k行大约需要4秒。我需要它远远低于1秒。如何提高性能?
答案 0 :(得分:2)
select products.*
from products
inner join (select p.id
from products p
inner join words_in_titles wit
on p.id = wit.product_id
where wit.word_id in (154,172)
group by p.id
having count(distinct wit.word_id) = 2) q
on products.id = q.id
答案 1 :(得分:1)
看起来您的查询不会有太大改进。
下面是一个示例表,其中包含600k行的产品和近600k行的words_in_titles。对于随机挑选的每2个word_id,应该有大约3到10个与该组合匹配的产品。
创建表格并填充数据。在words_in_titles(word_id)上创建索引
create table products (id int identity primary key clustered, title varchar(100))
insert into products
select convert(varchar(max),NEWID())
from master..spt_values a
inner join master..spt_values b on b.type='p' and b.number between 0 and 999
where a.type='P' and a.number between 0 and 600
create table words_in_titles (product_id int, word_id int,
primary key clustered(product_id, word_id))
insert words_in_titles
select distinct a,b
from
(
select floor(convert(bigint,convert(varbinary(max),newid())) % 60000) a, floor(convert(bigint,convert(varbinary(max),newid())) % 1000) b
from master..spt_values a
inner join master..spt_values b on b.type='p' and b.number between 0 and 999
where a.type='P' and a.number between 0 and 600
) x
create index ix_words_in_titles on words_in_titles(word_id)
下次采用不同的方法。我们使用SET STATISTICS来查看内部统计信息。您还应检查执行计划(但不检查统计信息时 - 它会污染统计信息)。 DBCC命令用于刷新缓冲区和清除计划,将@clean位设置为1以在运行之间清除,0以模拟在白天运行期间数据可能已在缓冲区中。
declare @clean bit set @clean = 1
if(@clean=1) exec ('dbcc dropcleanbuffers dbcc freeproccache')
set statistics io off
set statistics time off
-- pick two random word_id's as generated (@word1 and @word2 used below)
declare @word1 int, @word2 int
select top 1 @word1 = word_id from words_in_titles order by NEWID()
select top 1 @word2 = word_id from words_in_titles where word_id <> @word1 order by NEWID()
declare @words table (id int)
insert into @words values(@word1)
insert into @words values(@word2)
declare @wordsCnt int = (select count(*) from @words)
set statistics io on
set statistics time on
if(@clean=1) exec ('dbcc dropcleanbuffers dbcc freeproccache')
select *
from
(
select w.product_id
from words_in_titles w
where w.word_id = @word1
and exists (select * from words_in_titles t where t.word_id=@word2 and t.product_id=w.product_id)
-- expand with more EXISTS clauses
) q inner join products p on p.id = q.product_id
if(@clean=1) exec ('dbcc dropcleanbuffers dbcc freeproccache')
select *
from
(
select w1.product_id
from words_in_titles w1
where w1.word_id = @word1
intersect
select w2.product_id
from words_in_titles w2
where w2.word_id = @word2
) q inner join products p on p.id = q.product_id
if(@clean=1) exec ('dbcc dropcleanbuffers dbcc freeproccache')
select * from products where id IN
(
select product_id from WORDS_IN_TITLES inner join
(select id from @words) wrds ON wrds.id=WORDS_IN_TITLES.word_id
group by product_id HAVING count(*)=@wordsCnt
)
if(@clean=1) exec ('dbcc dropcleanbuffers dbcc freeproccache')
select products.*
from products
inner join (select p.id
from products p
inner join words_in_titles wit
on p.id = wit.product_id
where wit.word_id in (@word1,@word2)
group by p.id
having count(distinct wit.word_id) = 2) q
on products.id = q.id
统计信息:查找以Table
开头的批次以及之后的下一个SQL Server Execution Times
。四个这样的片段将代表4个查询定时。
Table 'products'. Scan count 0, logical reads 30, physical reads 0, read-ahead reads 51, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'words_in_titles'. Scan count 2, logical reads 8, physical reads 2, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table '#4D5F7D71'. Scan count 1, logical reads 1, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 47 ms.
(如上所示,原始查询使用临时排序) 如果你运行足够多次,你会看到最后一个查询总是比其余查询慢,第二个通常比第一个快,第三个(原始)有时在第一个和第二个之前或之后。
您可以尝试使用其中一种替代方案,但查询不太可能有太大改进。