表

Question

表

存储（100,000行）：id（pk），name，lat，lng，...

store_items（9,000,000行）：store_id（fk），item_id（fk）

项目（200,000行）：id（pk），name，...

item_words（1,000,000行）：item_id（fk），word_id（fk）

字（50,000行）：id（pk），字VARCHAR（255）

注意：所有ID都是整数。

======

索引

CREATE UNIQUE INDEX storeitems_storeid_itemid_i ON store_items（store_id，item_id）;

CREATE UNIQUE INDEX itemwords_wordid_itemid_i ON item_words（word_id，item_id）;

创建UNIQUE INDEX words_word_i ON words（word）;

注意：我更喜欢多列索引（storeitems_storeid_itemid_i和itemwords_wordid_itemid_i），因为：http://www.mysqlperformanceblog.com/2008/08/22/multiple-column-index-vs-multiple-indexes/

QUERY

select s.name, s.lat, s.lng, i.name
from words w, item_words iw, items i, store_items si, stores s
where iw.word_id=w.id
and i.id=iw.item_id
and si.item_id=i.id
and s.id=si.store_id
and w.word='MILK';

问题：经过的时间是20-120秒（取决于单词）!!!

explain $QUERY$
+----+-------------+-------+--------+-------------------------------------------------------+-----------------------------+---------+-----------------------------+------+-------------+
| id | select_type | table | type   | possible_keys                                         | key                         | key_len | ref                         | rows | Extra       |
+----+-------------+-------+--------+-------------------------------------------------------+-----------------------------+---------+-----------------------------+------+-------------+
|  1 | SIMPLE      | w     | const  | PRIMARY,words_word_i                                  | words_word_i                | 257     | const                       |    1 | Using index |
|  1 | SIMPLE      | iw    | ref    | itemwords_wordid_itemid_i,itemwords_itemid_fk         | itemwords_wordid_itemid_i   | 4       | const                       |    1 | Using index |
|  1 | SIMPLE      | i     | eq_ref | PRIMARY                                               | PRIMARY                     | 4       | iw.item_id                  |    1 |             |
|  1 | SIMPLE      | si    | ref    | storeitems_storeid_itemid_i,storeitems_itemid_fk      | storeitems_itemid_fk        | 4       | iw.item_id                  |   16 | Using index |
|  1 | SIMPLE      | s     | eq_ref | PRIMARY                                               | PRIMARY                     | 4       | si.store_id                 |    1 |             |

我希望经过的时间不到5秒！任何想法???

==============

我尝试了什么

我试图通过向查询添加表来查看执行时间的增加。

1表

select * from words where word='MILK';

Elapsed time: 0.4 sec

2桌

select count(*)
from words w, item_words iw
where iw.word_id=w.id
and w.word='MILK';

Elapsed time: 0.5-2 sec (depending on word)

3桌

select count(*)
from words w, item_words iw, items i
where iw.word_id=w.id
and i.id=iw.item_id
and w.word='MILK';

Elapsed time: 0.5-2 sec (depending on word)

4桌

select count(*)
from words w, item_words iw, items i, store_items si
where iw.word_id=w.id
and i.id=iw.item_id
and si.item_id=i.id
and w.word='MILK';

Elapsed time: 20-120 sec (depending on word)

我猜测索引的问题或查询/数据库的设计。但必须有办法让它快速运作。谷歌以某种方式做到了，他们的桌子要大得多！

Answer 1

a）你实际上是在mysql中编写查询来做FTS - ＆gt;使用像lucene这样的真实FTS。

b）显然，添加9M行连接是性能问题

c）如何限制连接（可能是完全使用当前的查询计划），如下所示：

SELECT
    s.name, s.lat, s.lng, i.name
FROM
    (SELECT * FROM words WHERE word='MILK') w
INNER JOIN
    item_words iw
ON
    iw.word_id=w.id
INNER JOIN
    items i
ON
    i.id=iw.item_id
INNER JOIN
    store_items si
ON
    si.item_id=i.id
INNER JOIN
    stores s
ON
    s.id=si.store_id;

这背后的逻辑是，不是加入完整的表然后限制结果，而是开始限制你将加入的表，这（如果连接顺序碰巧是我写的那个）将大大减少你的工作集和内部查询运行时间。

d）Google不会将FMS用于mysql

Answer 2

考虑对结构进行反规范化 - 第一个候选者是100万条记录item_words表 - 将这些单词直接带入表中。通过视图可以更轻松地创建唯一单词列表（取决于您需要此数据的频率，例如，您需要使用与关键字关联的产品提取商店列表）。其次 - 创建索引视图（不是MySQL中的选项，但肯定是其他商业数据库的选项）。

Answer 3

如果给出item_id，则没有可用于查找store_id的索引。如果store_id的基数足够低，它可能会从storeitems_storeid_itemid_i中获得一些好处，但由于你有100,000个商店，这可能不太有用。您可以尝试在store_items上创建一个首先列出item_id的索引：

CREATE UNIQUE INDEX storeitems_item_store ON store_items(item_id, store_id);

另外，我不确定在where子句中添加连接条件是否会影响性能，但是您可能会尝试将查询更改为以下内容：

select s.name, s.lat, s.lng, i.name
from words w LEFT JOIN item_words iw ON w.id=iw.word_id
LEFT JOIN items i ON i.id=iw.item_id
LEFT JOIN store_items si ON si.item_id=i.id
LEFT JOIN stores s ON s.id=si.store_id
where w.word='MILK';

Answer 4

如果不知道表格的确切布局，很难给出一个好的答案。但是这些类型的多表连接倾向于真正陷入困境。特别是如果构成选择表达的因素之一是动态字符串。

你可以尝试从存储过程或其他东西一次性返回表的多个结果集，然后加入SQL之外的数据。这样我就可以将大规模连接的查询时间从2分钟缩短到4秒。或者使用临时表并在完成后返回结果集。

首先从单词表中选择，因为那是你拥有动态字符串的地方。然后，您可以根据该查询返回的数据从其他表中进行选择。

Answer 5

试试这个。
以这种方式重写查询

select s.name, s.lat, s.lng, i.name  
from words w LEFT JOIN item_words iw ON w.id=iw.word_id AND w.word='MILK'  
LEFT JOIN items i ON i.id=iw.item_id  
LEFT JOIN store_items si ON si.item_id=i.id  
LEFT JOIN stores s ON s.id=si.store_id

并在（w.id，w.word）

上创建索引

Answer 6

您是否尝试过分析表格？这将有助于优化者选择最佳的执行计划。

e.g：

ANALYZE TABLE words
ANALYZE TABLE item_words
ANALYZE TABLE items
ANALYZE TABLE store_items
ANALYZE TABLE stores

请参阅：http://dev.mysql.com/doc/refman/5.0/en/analyze-table.html

优化5表SQL查询（stores =＆gt; items =＆gt; words）

表

索引

QUERY

问题：经过的时间是20-120秒（取决于单词）!!!

我希望经过的时间不到5秒！任何想法???

我尝试了什么

1表

2桌

3桌

4桌

6 个答案: