Question

我需要一些性能改进指导，我的查询需要几秒钟才能运行，这会导致服务器出现问题。此查询在我网站上最常见的页面上运行。我认为可能需要进行彻底的重新思考。

〜编辑〜此查询生成一个记录列表，其关键字与正在查询的程序（记录）的关键字匹配。我的网站是一个软件下载目录。此列表用于程序列表页面以显示其他类似程序。 PadID是我数据库中程序记录的主键。

〜编辑〜

继承我的查询

 select match_keywords.PadID, count(match_keywords.Word) as matching_words 
 from keywords current_program_keywords 
 inner join keywords match_keywords on
       match_keywords.Word=current_program_keywords.Word 
 where match_keywords.Word IS NOT NULL 
 and current_program_keywords.PadID=44243 
 group by match_keywords.PadID 
 order by matching_words DESC 
 LIMIT 0,11;

Heres查询解释。 alt text

下面是一些示例数据，但我怀疑如果没有更多数据，您将能够看到任何性能调整的效果，如果您愿意，我可以提供这些数据。

 CREATE TABLE IF NOT EXISTS `keywords` (
   `Word` varchar(20) NOT NULL,
   `PadID` bigint(20) NOT NULL,
   `LetterIdx` varchar(1) NOT NULL,
   KEY `Word` (`Word`),
   KEY `LetterIdx` (`LetterIdx`),
   KEY `PadID_2` (`PadID`,`Word`)
 ) ENGINE=MyISAM DEFAULT CHARSET=latin1;

 INSERT INTO `keywords` (`Word`, `PadID`, `LetterIdx`) VALUES
 ('tv', 44243, 'T'),
 ('satellite tv', 44243, 'S'),
 ('satellite tv to pc', 44243, 'S'),
 ('satellite', 44243, 'S'),
 ('your', 44243, 'X'),
 ('computer', 44243, 'C'),
 ('pc', 44243, 'P'),
 ('soccer on your pc', 44243, 'S'),
 ('sports on your pc', 44243, 'S'),
 ('television', 44243, 'T');

我尝试过添加索引，但这并没有多大区别。

 ALTER TABLE `keywords` ADD INDEX ( `PadID` )

Answer 1

尝试这种方法，不确定它是否会有所帮助，但至少是不同的：

select PadID, count(Word) as matching_words
from keywords k
where Word in (
  select Word 
  from keywords
  where PadID=44243 )
group by PadID 
order by matching_words DESC 
LIMIT 0,11

无论如何，你想要完成的工作很繁重，并且充满了字符串比较，可能导出关键字并在关键字表中只存储数字ID可以减少时间。

Answer 2

如果我理解正确，您可能会觉得这很有帮助。该解决方案利用了innodb的集群主键索引（http://pastie.org/1195127）

编辑：这里有一些可能引起关注的链接：

http://dev.mysql.com/doc/refman/5.0/en/innodb-index-types.html

http://dev.mysql.com/doc/refman/5.0/en/innodb-adaptive-hash.html

drop table if exists programmes;
create table programmes
(
prog_id mediumint unsigned not null auto_increment primary key,
name varchar(255) unique not null
)
engine=innodb;

insert into programmes (name) values 
('prog1'),('prog2'),('prog3'),('prog4'),('prog5'),('prog6');


drop table if exists keywords;
create table keywords
(
keyword_id mediumint unsigned not null auto_increment primary key,
name varchar(255) unique not null
)
engine=innodb;

insert into keywords (name) values 
('tv'),('satellite tv'),('satellite tv to pc'),('pc'),('computer');


drop table if exists programme_keywords;
create table programme_keywords
(
keyword_id mediumint unsigned not null,
prog_id mediumint unsigned not null,
primary key (keyword_id, prog_id), -- note clustered composite primary key
key (prog_id)
)
engine=innodb;

insert into programme_keywords values 

-- keyword 1
(1,1),(1,5),

-- keyword 2
(2,2),(2,4),

-- keyword 3
(3,1),(3,2),(3,5),(3,6),

-- keyword 4
(4,2),

-- keyword 5
(5,2),(5,3),(5,4);

/*
efficiently list all other programmes whose keywords match that of the 
programme currently being queried (for instance prog_id = 1) 
*/


drop procedure if exists list_matching_programmes;

delimiter #

create procedure list_matching_programmes
(
in p_prog_id mediumint unsigned
)
proc_main:begin

select
 p.*
from
 programmes p
inner join
(
 select distinct -- other programmes with same keywords as current
  pk.prog_id
 from
  programme_keywords pk
 inner join
 (
  select keyword_id from programme_keywords where prog_id = p_prog_id
 ) current_programme -- the current program keywords
 on pk.keyword_id = current_programme.keyword_id
 inner join programmes p on pk.prog_id = p.prog_id 

) matches 
on matches.prog_id = p.prog_id
order by
 p.prog_id;

end proc_main #


delimiter ;

call list_matching_programmes(1);
call list_matching_programmes(6); 


explain
select
 p.*
from
 programmes p
inner join
(
 select distinct
  pk.prog_id
 from
  programme_keywords pk
 inner join
 (
  select keyword_id from programme_keywords where prog_id = 1
 ) current_programme
 on pk.keyword_id = current_programme.keyword_id
 inner join programmes p on pk.prog_id = p.prog_id 

) matches 
on matches.prog_id = p.prog_id
order by
 p.prog_id;

编辑：根据要求添加了char_idx功能

alter table keywords add column char_idx char(1) null after name;

update keywords set char_idx = upper(substring(name,1,1));

select * from keywords;

explain
select
 p.*
from
 programmes p
inner join
(
 select distinct
  pk.prog_id
 from
  programme_keywords pk
 inner join
 (
  select keyword_id from keywords where char_idx = 'P' -- just change the driver query
 ) keywords_starting_with
 on pk.keyword_id = keywords_starting_with.keyword_id
) matches 
on matches.prog_id = p.prog_id
order by
 p.prog_id;

Answer 3

在查看了数据库后，我认为在查询中没有太大的改进空间，实际上在我的测试服务器上使用Word上的索引只需要大约0.15秒即可完成，没有索引几乎是4倍慢。

无论如何，我认为实现数据库sctructure f00的更改已经告诉你它会改善响应时间。

同样删除索引PadID_2，因为它现在是徒劳的，它只会减慢你的写入速度。你应该做什么，但它需要清理数据库是为了避免重复的关键字 - prodId对首先删除当前在DB中的重复的（在我的测试中使用3/4的数据库大约90k）这将减少查询时间并给出有意义的结果。如果你要求一个progId具有与progdID2重复的关键字ABC，则progID2将位于其他具有相同ABC关键字但没有重复的progID上，在我的测试中，我看到一个progID，它获得了几个相同的匹配progID我在查询。从数据库中删除重复数据后，您需要更改应用程序以避免将来再次出现此问题，为了安全起见，您可以将主键（或具有唯一激活的索引）添加到Word + ProgID。

MySQL，我的匹配查询需要一些性能建议

3 个答案: