优化大型数据集的查询 - 使用JOIN还是IN?

时间:2012-10-09 16:17:06

标签: mysql sql

鉴于我有标记的文章,一篇文章可以有n个标签。 目前,大约有250,000个标签条目都指向后面 在他们的归属文章上。

现在我想找到符合特定条件的文章中的所有标签。 我提出了两种不同的方法。两者都有缺点而且很慢。 可能有人可以指出我正确的方向,如何加快甚至来 提供更好的解决方案。

键(ind,rindex)是varchar(255),遗憾的是无法更改

查询#1

取7.5 - subselect在50ms内返回60条记录

SELECT count(*) AS tagscount, tags.value FROM tags 
  WHERE tags.`rindex` IN 
  ( 
    SELECT article.ind 
       FROM article 
       INNER JOIN struktur ON (struktur.ind = article.struktur) 
    WHERE article.date = '2011-12-21'
  ) 
  AND tags.`rtable` = 'article'
  GROUP BY tags.value ORDER BY tagscount DESC LIMIT 20

查询#2

需要60ms

SELECT count(*) AS tagscount, tags.value FROM tags 
  INNER JOIN article ON (article.ind = tags.rindex AND tags.rtable = 'article')
  LEFT JOIN structure ON (article.structure = structure.ind)
WHERE article.date = '2011-12-21'
GROUP BY tags.value ORDER BY tagscount DESC LIMIT 20 

奇怪的部分 - 重要

当我将article.date = '2011-12-21'更改为article.date >= '2009-12-21'查询#1

  • 取10.1s - subselect在70ms内返回18k行

查询#2

  • 服用14.2秒

如果您需要更多信息,我将很乐意提供

SCHEMAS

mysql> SHOW COLUMNS FROM tags;
+---------+--------------+------+-----+---------+-------+
| Field   | Type         | Null | Key | Default | Extra |
+---------+--------------+------+-----+---------+-------+
| ind     | varchar(255) | NO   | PRI |         |       |
| rtable  | varchar(255) | NO   | MUL |         |       |
| rindex  | varchar(255) | NO   | MUL |         |       |
| value   | varchar(40)  | YES  | MUL | NULL    |       |
+---------+--------------+------+-----+---------+-------+

mysql> SHOW indexes FROM tags
+-------+------------+---------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
| Table | Non_unique | Key_name            | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment |
+-------+------------+---------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
| tags  |          0 | tags_ind            |            1 | ind         | A         |      275834 |     NULL | NULL   |      | BTREE      |         |
| tags  |          1 | tags_tag            |            1 | tag         | A         |       27583 |     NULL | NULL   | YES  | BTREE      |         |
| tags  |          1 | tags_rindex         |            1 | rindex      | A         |       55166 |     NULL | NULL   |      | BTREE      |         |
| tags  |          1 | tags_rindex_tabelle |            1 | tabelle     | A         |           4 |       30 | NULL   |      | BTREE      |         |
| tags  |          1 | tags_rindex_tabelle |            2 | rindex      | A         |       55166 |       50 | NULL   |      | BTREE      |         |
+-------+------------+---------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+

mysql> SHOW COLUMNS FROM structure;
+------------------------+--------------+------+-----+---------+-------+
| Field                  | Type         | Null | Key | Default | Extra |
+------------------------+--------------+------+-----+---------+-------+
| ind                    | varchar(255) | NO   | PRI |         |       |
+------------------------+--------------+------+-----+---------+-------+

mysql> SHOW COLUMNS FROM artikel;
+--------------------+--------------+------+-----+------------+-------+
| Field              | Type         | Null | Key | Default    | Extra |
+--------------------+--------------+------+-----+------------+-------+
| ind                | varchar(255) | NO   | PRI |            |       |
| date               | date         | NO   | MUL | 0000-00-00 |       |
+--------------------+--------------+------+-----+------------+-------+

EXPLAIN

mysql> explain #1
+----+--------------------+----------+--------+-------------------------------------------------------------------------------------+---------------------+---------+---------------------+--------+----------------------------------------------+
| id | select_type        | table    | type   | possible_keys                                                                       | key                 | key_len | ref                 | rows   | Extra                                        |
+----+--------------------+----------+--------+-------------------------------------------------------------------------------------+---------------------+---------+---------------------+--------+----------------------------------------------+
|  1 | PRIMARY            | tags     | ref    | tags_rindex_tabelle                                                                 | tags_rindex_tabelle | 32      | const               | 177175 | Using where; Using temporary; Using filesort |
|  2 | DEPENDENT SUBQUERY | artikel  | eq_ref | artikel_ind,zeitraum_start_i,freigabe_i,korrektur_i,struktur_i,artikel_start_slot_i | artikel_ind         | 257     | func                |      1 | Using where                                  |
|  2 | DEPENDENT SUBQUERY | struktur | eq_ref | struktur_ind,struktur_host                                                          | struktur_ind        | 257     | ec.artikel.struktur |      1 | Using where                                  |
+----+--------------------+----------+--------+-------------------------------------------------------------------------------------+---------------------+---------+---------------------+--------+----------------------------------------------+
mysql> explain #2
+----+-------------+----------+--------+-------------------------------------------------------------------------------------+---------------------+---------+---------------------+--------+----------------------------------------------+
| id | select_type | table    | type   | possible_keys                                                                       | key                 | key_len | ref                 | rows   | Extra                                        |
+----+-------------+----------+--------+-------------------------------------------------------------------------------------+---------------------+---------+---------------------+--------+----------------------------------------------+
|  1 | SIMPLE      | tags     | ref    | tags_rindex,tags_rindex_tabelle                                                     | tags_rindex_tabelle | 32      | const               | 177175 | Using where; Using temporary; Using filesort |
|  1 | SIMPLE      | artikel  | eq_ref | artikel_ind,zeitraum_start_i,freigabe_i,korrektur_i,struktur_i,artikel_start_slot_i | artikel_ind         | 257     | ec.tags.rindex      |      1 | Using where                                  |
|  1 | SIMPLE      | struktur | eq_ref | struktur_ind,struktur_host                                                          | struktur_ind        | 257     | ec.artikel.struktur |      1 | Using where                                  |
+----+-------------+----------+--------+-------------------------------------------------------------------------------------+---------------------+---------+---------------------+--------+----------------------------------------------+

1 个答案:

答案 0 :(得分:1)

我认为artikel.ind不限于以与artikel.date相同的顺序提升词汇顺序。如果是,那么显而易见的解决方案是在对应于日期范围的rindex中添加限制。

实际上,它似乎正在使用一个合适的计划。

不改变数据类型的最佳选择是创建在(artikel.date, tags.value, artikel.ind)上编制索引的物化视图,然后查询。