如何加速MySQL查询?

时间:2010-11-08 12:17:02

标签: performance mysql query-optimization

geneHomology
============
id genome_name  gene_id  homolog_genome_name  homolog_gene_id consider_homolog
1  HomoSap      1007     MusMus               824             1
2  HomoSap      1007     MusMus               825             1
3  HomoSap      1007     MusMus               826             1
4  HomoSap      2890     EColi                2140            1
...

gene
====
genome_name  gene_id  gene_category
MusMus       823      Upregulated
MusMus       824      Downregulated
MusMus       825      Normal
MusMus       826      Normal
MusMus       827      Upregulated
EColi        2140     Normal
...

consider_homolog是一个枚举(0,1)。 genome_namegene_idgene中的主键。 geneHomology非常大 - 大约200M行。

我的目标是计算genes中每个基因从每个gene_category中得到多少同源物。

例如,根据上述数据,HomoSap 1007有3个Normal个同源词和1个Downregulated

所以我的查询是:

SELECT a.id,a.genome_name,a.gene_id,a.homolog_genome_name,a.homolog_gene_id,COUNT(b.gene_category)
FROM geneHomology a,gene b 
WHERE a.consider_homolog='1' AND a.homolog_genome_name=b.genome_name AND a.homolog_gene_id=b.gene_id 
GROUP BY a.genome_name,a.gene_id,b.gene_category;

永远不会回来(我耐心等待了一个多小时)。

我已将gene_category中的gene编入索引。

我是MySQL新手,但我可以根本访问数据库,所以我可以按照你的建议(仔细...)。我很乐意提供更多信息。

更新 这是查询的EXPLAIN输出:

+----+-------------+-------+------+-----------------------+----------------------+---------+----------------------------------------------------------+---------+---------------------------------+
| id | select_type | table | type | possible_keys         | key                  | key_len | ref                                                      | rows    | Extra                           |
+----+-------------+-------+------+-----------------------+----------------------+---------+----------------------------------------------------------+---------+---------------------------------+
|  1 | SIMPLE      | b     | ALL  | PRIMARY,gene_genome   | NULL                 | NULL    | NULL                                                     | 1560695 | Using temporary; Using filesort | 
|  1 | SIMPLE      | a     | ref  | geneHomologyHit_gene  | geneHomologyHit_gene | 54      | my_db_v71.b.gene_id,my_db_v71.b.genome_name              |      13 | Using where                     | 
+----+-------------+-------+------+-----------------------+----------------------+---------+----------------------------------------------------------+---------+---------------------------------+

更新2

mysql> SHOW INDEX FROM gene;
    +-------+------------+--------------------------+--------------+---------------------+-----------+-------------+----------+--------+------+------------+---------+
    | Table | Non_unique | Key_name                 | Seq_in_index | Column_name         | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment |
    +-------+------------+--------------------------+--------------+---------------------+-----------+-------------+----------+--------+------+------------+---------+
    | gene  |          0 | PRIMARY                  |            1 | gene_id             | A         |        NULL |     NULL | NULL   |      | BTREE      |         | 
    | gene  |          0 | PRIMARY                  |            2 | genome_name         | A         |     1560695 |     NULL | NULL   |      | BTREE      |         | 
    | gene  |          1 | gene_organism            |            1 | taxon_id            | A         |         392 |     NULL | NULL   |      | BTREE      |         | 
    | gene  |          1 | gene_genome              |            1 | genome_name         | A         |         853 |     NULL | NULL   |      | BTREE      |         | 
    | gene  |          1 | gene_gene_category       |            1 | gene_category       | A         |           5 |     NULL | NULL   |      | BTREE      |         | 
    +-------+------------+--------------------------+--------------+---------------------+-----------+-------------+----------+--------+------+------------+---------+
    5 rows in set (0.01 sec)

更新3

mysql> SHOW INDEX FROM geneHomology;
+--------------+------------+------------------------+--------------+--------------------------+-----------+-------------+----------+--------+------+------------+---------+
| Table        | Non_unique | Key_name               | Seq_in_index | Column_name              | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment |
+--------------+------------+------------------------+--------------+--------------------------+-----------+-------------+----------+--------+------+------------+---------+
| geneHomology |          0 | PRIMARY                |            1 | id                       | A         |   680326661 |     NULL | NULL   |      | BTREE      |         | 
| geneHomology |          1 | geneHomologyQuery_gene |            1 | gene_id                  | A         |     1498516 |     NULL | NULL   |      | BTREE      |         | 
| geneHomology |          1 | geneHomologyQuery_gene |            2 | genome_name              | A         |     1505147 |     NULL | NULL   |      | BTREE      |         | 
| geneHomology |          1 | geneHomologyHit_gene   |            1 | homolog_gene_id          | A         |    52332820 |     NULL | NULL   |      | BTREE      |         | 
| geneHomology |          1 | geneHomologyHit_gene   |            2 | homolog_genome_name      | A         |    52332820 |     NULL | NULL   |      | BTREE      |         | 
+--------------+------------+------------------------+--------------+--------------------------+-----------+-------------+----------+--------+------+------------+---------+
5 rows in set (0.00 sec)

更新4 有没有办法只获得部分结果,甚至看到我得到了我想要的东西?我尝试LIMIT 1000甚至LIMIT 10,但似乎没有改变任何内容。

更新5

mysql> SHOW CREATE TABLE geneHomology;
+--------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Table        | Create Table                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     |
+--------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| geneHomology | CREATE TABLE `geneHomology` (
  `id` bigint(20) NOT NULL auto_increment,
  `genome_name` varchar(20) NOT NULL,
  `gene_id` varchar(30) NOT NULL,
  `homolog_genome_name` varchar(20) NOT NULL,
  `homolog_gene_id` varchar(30) NOT NULL,
  `homolog_length` bigint(20) unsigned NOT NULL,
  `significance` double unsigned NOT NULL,
  `bit_score` double unsigned NOT NULL,
  `percent_identity` double unsigned NOT NULL,
  `start_match` int(10) unsigned NOT NULL,
  `end_match` int(10) unsigned NOT NULL,
  `start_match_percent` double unsigned NOT NULL,
  `end_match_percent` double unsigned NOT NULL,
  `strand` enum('+','-') default NULL,
  `homolog_start_match` int(10) unsigned NOT NULL,
  `homolog_end_match` int(10) unsigned NOT NULL,
  `homolog_start_match_percent` double unsigned NOT NULL,
  `homolog_end_match_percent` double unsigned NOT NULL,
  `homolog_strand` enum('+','-') default NULL,
  `consider_gene_homology` enum('0','1') NOT NULL,
  `reason_not_considered` varchar(50) default NULL,
  `num_hsps` int(10) unsigned NOT NULL,
  `homology_type` varchar(2) NOT NULL,
  PRIMARY KEY  (`id`),
  KEY `geneHomologygene` (`gene_id`,`genome_name`),
  KEY `geneHomologyhomolog_gene` (`homolog_gene_id`,`homolog_genome_name`)
) ENGINE=MyISAM AUTO_INCREMENT=680326662 DEFAULT CHARSET=latin1 | 
+--------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)

mysql> SHOW CREATE TABLE gene;
+-------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Table | Create Table                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     |
+-------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| gene  | CREATE TABLE `gene` (
  `taxon_id` int(10) unsigned NOT NULL,
  `genome_name` varchar(20) NOT NULL,
  `gene_id` varchar(30) NOT NULL,
  `symbol` varchar(30) default NULL,
  `type` varchar(30) default NULL,
  `product` varchar(300) default NULL,
  `strand` enum('+','-') NOT NULL,
  `start` bigint(20) unsigned NOT NULL,
  `end` bigint(20) unsigned NOT NULL,
  `gene_category` enum('Upregulated','Downregulated','Normal','n/a') NOT NULL,
  `consider_gene` enum('0','1') NOT NULL,
  `reason_not_considered` varchar(50) default NULL,
  `sequence` longblob NOT NULL,
  `additional_info` varchar(300) default NULL,
  PRIMARY KEY  (`gene_id`,`genome_name`),
  KEY `gene_organism` (`taxon_id`),
  KEY `gene_genome` (`genome_name`),
  KEY `gene_gene_category` (`gene_category`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1 | 
+-------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)

3 个答案:

答案 0 :(得分:1)

SELECT  a.genome_name, a.gene_id,
        cats.gene_category,
        (
        SELECT  COUNT(*)
        FROM    geneHomology ab
        JOIN    gene b
        ON      b.genome_name = ab.homolog_genome_name
                AND b.gene_id = ab.homolog_gene_id
        WHERE   ab.genome_name = a.genome_name
                AND ab.gene_id = a.gene_id
                AND b.gene_category = cats.gene_category
        ) cx
FROM    gene a
CROSS JOIN
        (
        SELECT  'Normal' AS gene_category
        UNION ALL
        SELECT  'Upregulated' AS gene_category
        UNION ALL
        SELECT  'Downregulated' AS gene_category
        ) cats
LIMIT 100

这会从您的计划中移除filesort

如果您的表格中包含所有可能的gene_categories,请将cats替换为它。

答案 1 :(得分:0)

根据你在这里发布的内容,我会推荐一点点非规范化,并将gene_category放入geneHomolgy中。然后,您可以完全摆脱连接,并且可以在Conside_homolog + GROUP BY字段上创建索引。

答案 2 :(得分:0)

首先从查询的WHERE部分删除对genome_name的引用 - 如果gene.gene_id和gene.genome_name都是唯一的,那么这里有一个明确的功能依赖,这有点混淆了这个问题 - 数字/数字连接将是略微提高文本/文本加入的效率。

看一下这个计划,它意味着你已经获得了geneHomology.hit_gene_id的索引。如果是这种情况,则没有太多的空间可以在没有架构更改的情况下使查询更快。然而,密钥长度为54表明你在该指数中有很多东西不应该存在。将其简化为hit_gene_id和consideration_homolog将对性能有所帮助,但限制因素是除非存在其他功能依赖性,否则似乎没有办法避免对基因进行全表扫描。

完成“SELECT * FROM gene”需要多快? ene_homology中有多少条记录?

看起来geneHomology分解基因和基因(本身)之间的N:M关系并应用标签。

如果homolog_genome_name中的值的数量相对较小,您可以考虑使用位图字段将其分解为基因。或者可能将关系归一化为一组1:1映射。或者,您可以枚举同源群集。