Question

我有一个简单的表（由django创建） - 引擎InnoDB：

+-------------+------------------+------+-----+---------+----------------+
| Field       | Type             | Null | Key | Default | Extra          |
+-------------+------------------+------+-----+---------+----------------+
| id          | int(11)          | NO   | PRI | NULL    | auto_increment |
| correlation | double           | NO   |     | NULL    |                |
| gene1_id    | int(10) unsigned | NO   | MUL | NULL    |                |
| gene2_id    | int(10) unsigned | NO   | MUL | NULL    |                |
+-------------+------------------+------+-----+---------+----------------+

该表格有 4.11亿行。（目标表将有大约461M行，21471 * 21470行）

我的主查询看起来像这样，最多可能有10个基因。

 SELECT gene1_id, AVG(correlation) AS avg FROM genescorrelation 
 WHERE gene2_id IN (176829, 176519, 176230) 
 GROUP BY gene1_id ORDER BY NULL

此查询非常慢，运行时间差不多2分钟：

21471 rows in set (1 min 11.03 sec)

索引（基数看起来很奇怪 - 太小了？）：

  Non_unique| Key_name                                         | Seq_in_index | Column_name | Collation | Cardinality |
          0 | PRIMARY                                          |            1 | id          | A         |   411512194 | 
          1 | c_gene1_id_6b1d81605661118_fk_genes_gene_entrez  |            1 | gene1_id    | A         |          18 |
          1 | c_gene2_id_2d0044eaa6fd8c0f_fk_genes_gene_entrez |            1 | gene2_id    | A         |          18 |

我只是在该表上运行select count（*），花了22分钟：

select count(*) from predictions_genescorrelation;

+-----------+
| count(*)  |
+-----------+
| 411512002 |
+-----------+
1 row in set (22 min 45.05 sec)

可能有什么不对？我怀疑mysql配置设置不正确。

在导入数据的过程中，我遇到了空间问题，因此可能也会影响数据库，尽管我之后运行了check table - 它需要2小时并且说好了。

此外 - 索引的基数看起来很奇怪。我在本地设置了较小的数据库，其值完全不同（254945589,56528,17）。

我应该重做索引吗？我应该检查MySQL的哪些参数？我的表被设置为InnoDB，MyISAM会有什么不同吗？

谢谢， matali

Answer 1

https://www.percona.com/blog/2006/12/01/count-for-innodb-tables/

没有SELECT COUNT(*)子句或没有WHERE ... SELECT COUNT(id)的情况下，

USE INDEX (PRIMARY)查询非常慢。

加快这个：

 SELECT gene1_id, AVG(correlation) AS avg FROM genescorrelation 
 WHERE gene2_id IN (176829, 176519, 176230) 
 GROUP BY gene1_id ORDER BY NULL

你应该按顺序使用复合键（gene2_id，gene1_id，correlation）。尝试

关于index-cardinality：Innodb表的统计数据是近似值，不准确（有时是疯狂的）。甚至有（IS？）错误报告https://bugs.mysql.com/bug.php?id=58382

尝试分析表并再次观看基数

MySQL：411M行

1 个答案: