我有两张桌子:
包和package_to_tag都运行MyISAM
表的结构如下:
包
+----------------+------------------+----------------+
| aid(primary) | source | date(index) |
+----------------+------------------+----------------+
| 1 | CA | 2013-04-05 |
+----------------+------------------+----------------+
| 2 | FL | 2013-05-05 |
+----------------+------------------+----------------+
| 3 | UT | 2012-06-13 |
+----------------+------------------+----------------+
| 4 | VT | 2011-04-29 |
+----------------+------------------+----------------+
| 5 | CT | 2013-04-10 |
+----------------+------------------+----------------+
package-tag上的package_to_tag 唯一索引,package_aid和tag都有索引
+---------------+------------------+
| package_aid | tag |
+---------------+------------------+
| 2 | sports |
+---------------+------------------+
| 2 | nba |
+---------------+------------------+
| 1 | food |
+---------------+------------------+
| 1 | burrito |
+---------------+------------------+
| 4 | hockey |
+---------------+------------------+
| 4 | sports |
+---------------+------------------+
| 3 | news |
+---------------+------------------+
| 5 | sports |
+---------------+------------------+
| 5 | nba |
+---------------+------------------+
所以我的基本查询是找出哪些包同时包含sports和nba作为标记:
SELECT package_aid FROM package_to_tag
WHERE tag IN("sports","nba")
GROUP BY package_aid
HAVING COUNT(*) = 2
这很有用,直到我尝试为结果添加日期排序。 (请记住,我的包记录集在400k范围内)
我根据匹配标签获取源的查询是:
SELECT package_aid, source
FROM package_to_tag
RIGHT JOIN packages ON packages.aid = package_to_tag.package_aid
AND tag IN("sports","nba")
GROUP BY package_aid
HAVING COUNT(*) = 2
ORDER BY date DESC
LIMIT 500
其中,400k记录,最多需要5秒钟。除非我删除date
种类。然后它只需不到一秒钟。所以,由于我总是在IN语句中获得了不错的成功,我尝试用以下内容缩小我的初始结果集:
SELECT aid,source FROM packages
WHERE aid IN(
SELECT package_aid FROM package_to_tag
WHERE tag IN("sports","nba")
GROUP BY package_aid
HAVING COUNT(*) = 2
)
ORDER BY date DESC
LIMIT 500
我认为我只会将排序应用于大约8-10k的记录而不是整个记录集。
但是,这只是扁平的,以100%的利用率固定数据库,我被迫重新启动....即使我将内部选择范围缩小到额外的标签,总共80条或更少。
我尝试只运行此查询:
SELECT package_aid FROM package_to_tag
WHERE tag IN("sports","nba")
GROUP BY package_aid
HAVING COUNT(*) = 2
这会在一秒钟内返回8-10k条记录。
我错过了什么?
答案 0 :(得分:3)
早期版本的MySQL在使用子查询优化in
时遇到问题。一个简单的解决方案是将其重写为exists
子句:
SELECT aid,source FROM packages
WHERE exists (
SELECT package_aid
FROM package_to_tag
WHERE tag IN("sports","nba") and package_aid = packages.aid
GROUP BY package_aid
HAVING COUNT(*) = 2
)
ORDER BY date DESC
LIMIT 500
在package_to_tag(pages.aid, tag)
上建立一个索引应该是一个很好的帮助。