在哪里崩溃MySQL

时间:2013-06-16 20:21:13

标签: mysql myisam

我有两张桌子:

包和package_to_tag都运行MyISAM

表的结构如下:

+----------------+------------------+----------------+
|   aid(primary) |     source       |   date(index)  |
+----------------+------------------+----------------+
|   1            |    CA            |   2013-04-05   |
+----------------+------------------+----------------+
|   2            |    FL            |   2013-05-05   |
+----------------+------------------+----------------+
|   3            |    UT            |   2012-06-13   |
+----------------+------------------+----------------+
|   4            |    VT            |   2011-04-29   |
+----------------+------------------+----------------+
|   5            |    CT            |   2013-04-10   |
+----------------+------------------+----------------+
package-tag上的

package_to_tag 唯一索引,package_aid和tag都有索引

+---------------+------------------+
|  package_aid  |     tag          |
+---------------+------------------+
|   2           |    sports        |
+---------------+------------------+
|   2           |    nba           |
+---------------+------------------+
|   1           |    food          |
+---------------+------------------+
|   1           |    burrito       |
+---------------+------------------+
|   4           |    hockey        |
+---------------+------------------+
|   4           |    sports        |
+---------------+------------------+
|   3           |    news          |
+---------------+------------------+
|   5           |    sports        |
+---------------+------------------+
|   5           |    nba           |
+---------------+------------------+

所以我的基本查询是找出哪些包同时包含sports和nba作为标记:

SELECT package_aid FROM package_to_tag
WHERE tag IN("sports","nba")
GROUP BY package_aid
HAVING COUNT(*) = 2

这很有用,直到我尝试为结果添加日期排序。 (请记住,我的包记录集在400k范围内)

我根据匹配标签获取源的查询是:

SELECT package_aid, source 
FROM package_to_tag
RIGHT JOIN packages ON packages.aid = package_to_tag.package_aid
AND tag IN("sports","nba")
GROUP BY package_aid
HAVING COUNT(*) = 2
ORDER BY date DESC
LIMIT 500

其中,400k记录,最多需要5秒钟。除非我删除date种类。然后它只需不到一秒钟。所以,由于我总是在IN语句中获得了不错的成功,我尝试用以下内容缩小我的初始结果集:

SELECT aid,source FROM packages
WHERE aid IN(
  SELECT package_aid FROM package_to_tag
  WHERE tag IN("sports","nba")
  GROUP BY package_aid
  HAVING COUNT(*) = 2
)
ORDER BY date DESC
LIMIT 500

我认为我只会将排序应用于大约8-10k的记录而不是整个记录集。

但是,这只是扁平的,以100%的利用率固定数据库,我被迫重新启动....即使我将内部选择范围缩小到额外的标签,总共80条或更少。

我尝试只运行此查询:

SELECT package_aid FROM package_to_tag
WHERE tag IN("sports","nba")
GROUP BY package_aid
HAVING COUNT(*) = 2

这会在一秒钟内返回8-10k条记录。

我错过了什么?

1 个答案:

答案 0 :(得分:3)

早期版本的MySQL在使用子查询优化in时遇到问题。一个简单的解决方案是将其重写为exists子句:

SELECT aid,source FROM packages
WHERE exists (
  SELECT package_aid
  FROM package_to_tag
  WHERE tag IN("sports","nba") and package_aid = packages.aid
  GROUP BY package_aid
  HAVING COUNT(*) = 2
)
ORDER BY date DESC
LIMIT 500

package_to_tag(pages.aid, tag)上建立一个索引应该是一个很好的帮助。