优化缓慢的MySQL查询

时间:2011-10-29 13:48:51

标签: mysql performance optimization

enter image description here

我有一个MySQL查询如下:

SELECT KeywordText, SUM(Frequency) AS Frequency FROM Keyword, Keyword_Polling_Frequency_Index
WHERE Keyword.KeywordText 
IN ('deal', 'obama' and other keywords...) 
AND RSSFeedNo IN (106, 107 and other RSS feeds) 
AND PollingDateTime 
BETWEEN '2011-10-28 13:00:00' AND '2011-10-28 13:59:00' 
AND Keyword.KeywordNo = Keyword_Polling_Frequency_Index.KeywordNo 
GROUP BY Keyword.KeywordText 
ORDER BY Keyword.KeywordText ASC

该查询由每小时批处理程序使用,该程序涉及两个表,用于从给定小时的RSS提要列表中获取关键字列表的频率。 Keyword_Polling_Frequency_Index表具有KeywordNo,RSSFeedNo和PollingDateTime的复合主键。该查询将此表连接到包含KeywordText的Keyword表。 column keywordText具有MySQL MyISAM全文索引。

在测试中,发现它表现令人满意,但现在开始运行非常缓慢并影响应用程序页面的交互速度。当我检查MySQL日志时,我发现MySQL正在创建临时表。

所以,我的问题是,鉴于此查询必须处理数十个RSS源中的数十个关键字来计算频率,是否有人可以建议进行优化?

我曾想过用关键字打破查询,但我不相信这个实用性。

有人可以帮忙吗?

我正在使用MySQL Community Edition 5.X,上面显示了此查询版本的EXTENDED EXPLAIN。

表的SQL如下:

CREATE TABLE `keyword` (
`KeywordNo` int(10) unsigned NOT NULL AUTO_INCREMENT,
`KeywordText` varchar(64) NOT NULL,
`UserOriginated` enum('TRUE','FALSE') NOT NULL,
`Active` enum('TRUE','FALSE') NOT NULL,
`UserNo` varchar(50) NOT NULL,
`StopWord` enum('TRUE','FALSE') NOT NULL,
`CreatedDate` date NOT NULL,
`CreatedTime` time NOT NULL,
PRIMARY KEY (`KeywordNo`),
FULLTEXT KEY `KEYWORDTEXT` (`KeywordText`)
) ENGINE=MyISAM AUTO_INCREMENT=44047 DEFAULT CHARSET=latin1$$


CREATE TABLE `keyword_polling_frequency_index` (
`KeywordNo` int(10) unsigned NOT NULL,
`RSSFeedNo` int(10) unsigned NOT NULL,
`PollingDateTime` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
`Frequency` int(10) NOT NULL,
`Active` enum('TRUE','FALSE') NOT NULL,
`UserNo` varchar(50) NOT NULL,
PRIMARY KEY (`KeywordNo`,`RSSFeedNo`,`PollingDateTime`),
KEY `FK_keyword_polling_frequency_index_1` (`UserNo`),
CONSTRAINT `FK_keyword_polling_frequency_index_1` FOREIGN KEY (`UserNo`) REFERENCES `user`    (`UserNo`) ON DELETE CASCADE ON UPDATE CASCADE
) ENGINE=InnoDB DEFAULT CHARSET=latin1$$

2 个答案:

答案 0 :(得分:1)

如前所述,也按照提到的顺序向PollingDateTime字段添加索引。这是我的建议:

SELECT 
    K.KeywordText, 
    SUM(F.Frequency) AS Frequency 
FROM 
    Keyword K, Keyword_Polling_Frequency_Index F
WHERE 
    EXISTS
        (
        SELECT 1
        FROM Keyword K1
        WHERE
            MATCH K1.KeywordText AGAINST ('deal obama "another keyword" yetanother' IN BOOLEAN MODE)
            AND K1.KeywordNo = K.KeywordNo
        )
    AND K.KeywordNo = F.KeywordNo
    AND F.PollingDateTime BETWEEN '2011-10-28 13:00:00' AND '2011-10-28 13:59:00'
    AND F.RSSFeedNo IN (106, 107, 110)
    GROUP BY K.KeywordText 
    ORDER BY K.KeywordText ASC

这可能会减少比较记录的数量(SQL由内而外解析),而不是直接匹配两个表(N x N)。

答案 1 :(得分:0)

如果您没有任何索引,则应创建相关索引。

最低指数位于keyword_polling_frequency_index.PollingDateTime