我在这里问了最初的问题in stack before.,抱歉,这不是解决此问题的最佳方法。
问题是我有一个查询,即使使用INNER JOIN至少也要花费5秒钟才能完成,我想知道是否有更快的方法可以做到这一点。这是我得到的答案:
` q = "SELECT DISTINCT e2.eventId FROM event_tags e1 INNER JOIN event_tags e2 " \
"ON BINARY e2.tagName=e1.tagName AND e2.eventId != e1.eventId " \
"WHERE e1.eventId = {} ORDER BY RAND() LIMIT {}".format(eventId, '10')`
我的标签表如下
mysql> describe event_tags;
+---------+------------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+---------+------------------+------+-----+---------+----------------+
| tagId | int(10) unsigned | NO | PRI | NULL | auto_increment |
| tagName | text | NO | | NULL | |
| eventId | int(10) unsigned | NO | PRI | NULL | |
+---------+------------------+------+-----+---------+----------------+
3 rows in set (0.00 sec)
,而且我有一堆标签,它们只会继续增长。当我对标签表进行计数时,我有504,402个tagId,并且标签名也是如此。我怎样才能使查找更快?
以下是事件代码表的一些示例数据
mysql> select * from event_tags limit 40;
+-------+-------------------------------------------+---------+
| tagId | tagName | eventId |
+-------+-------------------------------------------+---------+
| 261 | Justin Timberlake (Rescheduled from 11/9) | 38 |
| 264 | Rogers Arena | 38 |
| 267 | Pop | 38 |
| 271 | Rock | 38 |
| 285 | Justin Timberlake (Rescheduled from 11/8) | 41 |
| 288 | Rogers Arena | 41 |
| 291 | Pop | 41 |
| 294 | Rock | 41 |
| 595 | Yogesh Soman | 84 |
| 599 | Geetanjali Kulkarni | 84 |
| 602 | Bhagyashree Shankpal | 84 |
| 606 | Lalit Prabhakar | 84 |
| 611 | Sameer Sanjay Vidwans | 84 |
| 617 | Drama | 84 |
| 647 | Shrihari Abhyankar | 89 |
| 651 | Deepali Borkar | 89 |
| 654 | Akash Kamble | 89 |
| 657 | Sharavi Kulkarni | 89 |
| 660 | Sharav Wadhawekar | 89 |
| 667 | Nipun Dharmadhikari | 89 |
| 670 | Drama | 89 |
| 689 | Frank Grillo | 94 |
| 692 | Jamie Bell | 94 |
| 695 | Margaret Qualley | 94 |
| 700 | James Badge Dale | 94 |
| 704 | Tim Sutton | 94 |
| 710 | Drama | 94 |
| 734 | Bruce Dern | 101 |
| 739 | Anthony Michael Hall | 101 |
| 745 | Sean Astin | 101 |
| 749 | Aly Michalka | 101 |
| 754 | Victoria Smurfit | 101 |
| 759 | Carl Bessai | 101 |
| 762 | Drama | 101 |
| 783 | Sarah Clarke | 106 |
| 785 | Xander Berkeley | 106 |
| 787 | Kristen Gutoskie | 106 |
| 790 | Mackenzie Astin | 106 |
| 794 | Bobby Campo | 106 |
| 798 | Adam Cushman | 106 |
+-------+-------------------------------------------+---------+
40 rows in set (0.00 sec)
这是该表的CREATE语句:
CREATE TABLE IF NOT EXISTS event_tags(
tagId INT UNSIGNED NOT NULL AUTO_INCREMENT,
tagName VARCHAR(40) NOT NULL,
eventId INT UNSIGNED NOT NULL,
PRIMARY KEY(tagId, eventId)
);
以下是查询的解释:
mysql> EXPLAIN SELECT DISTINCT e2.eventId FROM event_tags e1 INNER JOIN event_tags e2 ON BINARY e2.tagName=e1.tagName AND e2.eventId != e1.eventId WHERE e1.eventId = 487 ORDER BY RAND() LIMIT 10
-> ;
+----+-------------+-------+------+---------------+------+---------+------+-------+----------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+------+---------------+------+---------+------+-------+----------------------------------------------+
| 1 | SIMPLE | e1 | ALL | NULL | NULL | NULL | NULL | 34275 | Using where; Using temporary; Using filesort |
| 1 | SIMPLE | e2 | ALL | NULL | NULL | NULL | NULL | 34275 | Using where; Using join buffer |
+----+-------------+-------+------+---------------+------+---------+------+-------+----------------------------------------------+
2 rows in set (0.03 sec)
更新:我在表上创建了一个索引:
CREATE INDEX tagsNdx ON event_tags (eventId, tagName(255));
现在看起来像这样:
mysql> show index from event_tags;
+------------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+------------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| event_tags | 0 | PRIMARY | 1 | tagId | A | 455408 | NULL | NULL | | BTREE | | |
| event_tags | 0 | PRIMARY | 2 | eventId | A | 455408 | NULL | NULL | | BTREE | | |
| event_tags | 1 | tagsNdx | 1 | eventId | A | 186 | NULL | NULL | | BTREE | | |
| event_tags | 1 | tagsNdx | 2 | tagName | A | 186 | 255 | NULL | | BTREE | | |
+------------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
4 rows in set (0.00 sec)
但是它仍然很慢。
答案 0 :(得分:0)
以下是可能的优化: