描述
根据explain
命令,有一个范围导致查询执行全表扫描(160k行)。如何保持范围条件并减少扫描?我希望罪魁祸首是:
Y.YEAR BETWEEN 1900 AND 2009 AND
代码
以下是具有范围条件的代码(STATION_DISTRICT
可能是多余的)。
SELECT
COUNT(1) as MEASUREMENTS,
AVG(D.AMOUNT) as AMOUNT,
Y.YEAR as YEAR,
MAKEDATE(Y.YEAR,1) as AMOUNT_DATE
FROM
CITY C,
STATION S,
STATION_DISTRICT SD,
YEAR_REF Y FORCE INDEX(YEAR_IDX),
MONTH_REF M,
DAILY D
WHERE
-- For a specific city ...
--
C.ID = 10663 AND
-- Find all the stations within a specific unit radius ...
--
6371.009 *
SQRT(
POW(RADIANS(C.LATITUDE_DECIMAL - S.LATITUDE_DECIMAL), 2) +
(COS(RADIANS(C.LATITUDE_DECIMAL + S.LATITUDE_DECIMAL) / 2) *
POW(RADIANS(C.LONGITUDE_DECIMAL - S.LONGITUDE_DECIMAL), 2)) ) <= 50 AND
-- Get the station district identification for the matching station.
--
S.STATION_DISTRICT_ID = SD.ID AND
-- Gather all known years for that station ...
--
Y.STATION_DISTRICT_ID = SD.ID AND
-- The data before 1900 is shaky; insufficient after 2009.
--
Y.YEAR BETWEEN 1900 AND 2009 AND
-- Filtered by all known months ...
--
M.YEAR_REF_ID = Y.ID AND
-- Whittled down by category ...
--
M.CATEGORY_ID = '003' AND
-- Into the valid daily climate data.
--
M.ID = D.MONTH_REF_ID AND
D.DAILY_FLAG_ID <> 'M'
GROUP BY
Y.YEAR
更新
SQL正在执行全表扫描,这导致MySQL执行“复制到tmp表”,如下所示:
+----+-------------+-------+--------+-----------------------------------+--------------+---------+-------------------------------+--------+-------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+-------+--------+-----------------------------------+--------------+---------+-------------------------------+--------+-------------+ | 1 | SIMPLE | C | const | PRIMARY | PRIMARY | 4 | const | 1 | | | 1 | SIMPLE | Y | range | YEAR_IDX | YEAR_IDX | 4 | NULL | 160422 | Using where | | 1 | SIMPLE | SD | eq_ref | PRIMARY | PRIMARY | 4 | climate.Y.STATION_DISTRICT_ID | 1 | Using index | | 1 | SIMPLE | S | eq_ref | PRIMARY | PRIMARY | 4 | climate.SD.ID | 1 | Using where | | 1 | SIMPLE | M | ref | PRIMARY,YEAR_REF_IDX,CATEGORY_IDX | YEAR_REF_IDX | 8 | climate.Y.ID | 54 | Using where | | 1 | SIMPLE | D | ref | INDEX | INDEX | 8 | climate.M.ID | 11 | Using where | +----+-------------+-------+--------+-----------------------------------+--------------+---------+-------------------------------+--------+-------------+
答案
使用STRAIGHT_JOIN
后:
+----+-------------+-------+--------+-----------------------------------+---------------+---------+-------------------------------+------+---------------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+-------+--------+-----------------------------------+---------------+---------+-------------------------------+------+---------------------------------+ | 1 | SIMPLE | C | const | PRIMARY | PRIMARY | 4 | const | 1 | Using temporary; Using filesort | | 1 | SIMPLE | S | ALL | PRIMARY | NULL | NULL | NULL | 7795 | Using where | | 1 | SIMPLE | SD | eq_ref | PRIMARY | PRIMARY | 4 | climate.S.STATION_DISTRICT_ID | 1 | Using index | | 1 | SIMPLE | Y | ref | PRIMARY,STAT_YEAR_IDX | STAT_YEAR_IDX | 4 | climate.S.STATION_DISTRICT_ID | 1650 | Using where | | 1 | SIMPLE | M | ref | PRIMARY,YEAR_REF_IDX,CATEGORY_IDX | YEAR_REF_IDX | 8 | climate.Y.ID | 54 | Using where | | 1 | SIMPLE | D | ref | INDEX | INDEX | 8 | climate.M.ID | 11 | Using where | +----+-------------+-------+--------+-----------------------------------+---------------+---------+-------------------------------+------+---------------------------------+
相关
谢谢!
答案 0 :(得分:2)
ONE Request ...看起来你知道你的数据。添加关键字“STRAIGHT_JOIN”并查看结果...
SELECT STRAIGHT_JOIN ...你的查询的其余部分......
Straight-join告诉MySql我已经列出了。因此,您的CITY表是FROM列表中的第一个,因此表明您希望它是您的主要...此外,CITY的WHERE子句是立即过滤器。话虽如此,它可能会飞过查询的其余部分......
希望它有所帮助......它为我提供了数百万条记录的gov't数据,并加入了10多个mySql试图为我思考的查找表。
答案 1 :(得分:0)
为了在查询之间进行高效的,您需要在YEAR列上使用b树索引。例如:
CREATE INDEX id_index USING BTREE ON YEAR_REF (YEAR);
BTREE索引允许有效的范围查询,如果这实际上是根问题,那么拥有这样的索引应该摆脱全表扫描并让它只扫描范围内的表的一部分。在wikipedia
上阅读有关btree的更多信息但是,与任何优化建议一样,您应该进行衡量,以确保您不会弊大于利。
答案 2 :(得分:0)
您可以在半径内搜索更改为在边界框中搜索吗?
您知道城市,因此您可以在应用程序中计算边界框。
也许这个
S.LATITUDE_DECIMAL >= latitude_lower and
S.LATITUDE_DECIMAL <= latitude_upper and
S.LONGITUDE_DECIMAL >= longitude_lower and
S.LONGITUDE_DECIMAL <= longitude_upper
可能会快一点吗?