慢子查询:按组最大值分组

时间:2017-07-20 14:49:46

标签: mysql query-optimization query-performance

我有两张桌子:

overflow: auto

CREATE TABLE share_prices (
    price_id int(10) unsigned NOT NULL AUTO_INCREMENT,
    price_date date NOT NULL,
    company_id int(10) NOT NULL,
    high decimal(20,2) DEFAULT NULL,
    low decimal(20,2) DEFAULT NULL,
    close decimal(20,2) DEFAULT NULL,
    PRIMARY KEY (price_id),
    UNIQUE KEY price_date (price_date,company_id),
    KEY company_id (company_id),
    KEY price_date_2 (price_date)
) ENGINE=InnoDB AUTO_INCREMENT=368586 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci;

这是当前的查询:

CREATE TABLE rating_lookup (
    rating_id int(11) NOT NULL,
    start_date date DEFAULT NULL,
    start_price decimal(10,2) DEFAULT NULL,
    broker_id int(11) DEFAULT NULL,
    company_id int(11) DEFAULT NULL,
    end_date date DEFAULT NULL,
    PRIMARY KEY (rating_id),
    KEY idx_rating_lookup_company_id (company_id)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

目前此查询需要 10.969秒

孤立的子查询需要 0.391秒(持续时间)/ 10.438秒(获取)

查询目标:

获取每个broker_id的正确评分总数。

正确评级定义为自start_price以来达到+ 5%的评级。

我希望大幅减少查询时间,即使重组数据库是唯一的方法。

附录

解释上述问题:

SELECT broker_id, count(rating_id)

FROM (

    SELECT rating_lookup.*,
    share_prices.company_id as correct_company,
    share_prices.price_date,
    max(high) as peak_gain,
    ( ( ( max(high) - rating_lookup.start_price ) / rating_lookup.start_price ) * 100 ) as percent_gain

    FROM rating_lookup, share_prices

    WHERE share_prices.price_date > rating_lookup.start_date 
    AND share_prices.price_date < ifnull(end_date, curdate())
    AND share_prices.company_id = rating_lookup.company_id

    GROUP BY rating_id

    HAVING percent_gain > 5

) correct

GROUP BY broker_id

+---+---------+---------------+-------+--------------------------------------+------------+---+----------------------------------------+---------+---------------------------------+ | 1 | PRIMARY | <derived2> | ALL | | | | | 3894800 | Using temporary; Using filesort | | 2 | DERIVED | rating_lookup | index | PRIMARY,idx_rating_lookup_company_id | PRIMARY | 4 | | 18200 | Using where | | 2 | DERIVED | share_prices | ref | price_date,company_id,price_date_2 | company_id | 4 | brokermetrics.rating_lookup.company_id | 214 | Using where | +---+---------+---------------+-------+--------------------------------------+------------+---+----------------------------------------+---------+---------------------------------+ ~375,000行

share_prices ~18,000行,约有46个独立经纪人

3 个答案:

答案 0 :(得分:2)

我认为市场收盘后每天会插入一次股票价格(如果你覆盖多个市场,则每天插入几次)。

如果您无法充分调整查询,则可以预先计算结果。每次加载一批新股票价格后运行查询。将结果插入新表中。读取预先计算的数据应该足够快。

答案 1 :(得分:1)

PRIMARY KEY (price_id),   -- useless
UNIQUE KEY price_date (price_date,company_id), -- could/should be PK
KEY company_id (company_id),
KEY price_date_2 (price_date)  -- redundant

- &GT;

PRIMARY KEY(price_date, company_id),
KEY company_id (company_id)

decimal(20,2)消耗9个字节,现有的库存不可能超过小数点左边的6位数,并且不处理需要超过两位小数的低价股票。考虑DECIMAL(8,2)(4个字节)或(10,4)(5个字节)。 FLOAT(4个字节)可以避免大多数问题,但仅限于7个重要位数。

较小 - &gt;更多可缓存 - &gt;少I / O - &gt;更快。

不要选择你不需要的东西。你所需要的只是

SELECT rating_id, broker_id

并将表达式移动到HAVING:

HAVING ((( max(high)... *100) > 5

请使用JOIN..ON语法:

  FROM  rating_lookup, share_prices
  WHERE share_prices.company_id = rating_lookup.company_id
    AND ...

- &GT;

  FROM rating_lookup AS r
  JOIN share_prices AS p
    ON p.company_id = r.company_id
  WHERE ...

答案 2 :(得分:1)

扩展Klas的回答,下面是一个“摘要”表的模式,可以使用每个经纪人预先计算的记录,每个公司,每天填充。

免责声明:尚未对真实数据进行过测试,但应该有效。

CREATE TABLE `price_summary` (
`price_id` int(10) NOT NULL,
`broker_id` int(10) NOT NULL DEFAULT '0',
`company_id` int(10) NOT NULL DEFAULT '0',
`start_date` int(10) NOT NULL DEFAULT '0',
`end_date` int(10) NOT NULL DEFAULT '0',
`peak_gain` int(10) NOT NULL DEFAULT '0',
`max_price` int(10) NOT NULL DEFAULT '0',
`percentage_gain` decimal(10,0) NOT NULL DEFAULT '0',
`updated_on` int(10) NOT NULL DEFAULT '0'
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

--
-- Indexes for dumped tables
--

--
-- Indexes for table `price_summary`
--
ALTER TABLE `price_summary`
ADD PRIMARY KEY (`price_id`),
ADD UNIQUE KEY `broker_company_date` (`broker_id`,`company_id`,`start_date`) USING BTREE,
ADD KEY `broker_id` (`broker_id`),
ADD KEY `company_id` (`company_id`),
ADD KEY `start_date` (`start_date`),
ADD KEY `end_date` (`end_date`),
ADD KEY `peak_gain` (`peak_gain`),
ADD KEY `max_price` (`max_price`),
ADD KEY `percentage_gain` (`percentage_gain`);

ALTER TABLE `price_summary`
MODIFY `price_id` int(10) NOT NULL AUTO_INCREMENT; 

用于检索所需记录的示例查询。

SELECT
    broker_id,
    count(company_id) as company_count
FROM
    price_summary
WHERE
    start_date > {input_timestamp}
    AND
    end_date < {input_timestamp/now()}
    AND
    percentage_gain > {input_percentage}
GROUP BY
    broker_id