可以提高这个SQL查询的性能吗?

时间:2015-07-30 08:23:32

标签: mysql sql performance

我有一个超过100,000,000行的表,我有一个如下所示的查询:

SELECT
    COUNT(IF(created_at >= '2015-07-01 00:00:00', 1, null)) AS 'monthly',
    COUNT(IF(created_at >= '2015-07-26 00:00:00', 1, null)) AS 'weekly',
    COUNT(IF(created_at >= '2015-06-30 07:57:56', 1, null)) AS '30day',
    COUNT(IF(created_at >= '2015-07-29 17:03:44', 1, null)) AS 'recent'
FROM
    items
WHERE
    user_id = 123456;

表格如下:

CREATE TABLE `items` (
   `user_id` int(11) NOT NULL,
   `item_id` int(11) NOT NULL,
   `created_at` timestamp NOT NULL DEFAULT '0000-00-00 00:00:00',
    PRIMARY KEY (`user_id`,`item_id`),
    KEY `user_id` (`user_id`,`created_at`),
    KEY `created_at` (`created_at`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;

解释看起来相当无害,减去大量的行数:

1   SIMPLE  items   ref PRIMARY,user_id user_id 4   const   559864  Using index

我使用该查询为4段时间收集特定用户的计数。 是否有更智能/更快的方式来获取相同的数据,或者是我唯一的选择来计算这些新行被放入此表中?

3 个答案:

答案 0 :(得分:2)

如果你在created_at上有索引,我也会在where子句中输入created_at> =' 2015-06-30 07:57:56'这是您的细分受众群中可能的最低日期。

同样使用相同的索引,它可能会在4个查询中进行拆分:

select count(*) AS '30day'
FROM
items
WHERE
    user_id = 123456
and created_at >= '2015-06-30 07:57:56'
union ....

等等

答案 1 :(得分:1)

我会在created_at字段中添加一个索引:

ALTER TABLE items ADD INDEX idx_created_at (created_at)

或(正如Thomas建议的那样),因为您还在user_id上过滤了created_at和user_id上的复合索引:

ALTER TABLE items ADD INDEX idx_user_created_at (user_id, created_at)

然后我会将您的查询编写为:

SELECT 'monthly' as description, COUNT(*) AS cnt FROM items
WHERE created_at >= '2015-07-01 00:00:00' AND user_id = 123456

UNION ALL

SELECT 'weekly' as description, COUNT(*) AS cnt FROM items
WHERE created_at >= '2015-07-26 00:00:00' AND user_id = 123456

UNION ALL

SELECT '30day' as description, COUNT(*) AS cnt FROM items
WHERE created_at >= '2015-06-30 07:57:56' AND user_id = 123456

UNION ALL

SELECT 'recent' as description, COUNT(*) AS cnt FROM items
WHERE created_at >= '2015-07-29 17:03:44' AND user_id = 123456
是的,输出有点不同。或者您可以使用内联查询:

SELECT
  (SELECT COUNT(*) FROM items WHERE created_at>=... AND user_id=...) AS 'monthly',
  (SELECT COUNT(*) FROM items WHERE created_at>=... AND user_id=...) AS 'weekly',
  ...

如果你想要一个平均值,你可以使用一个子查询:

SELECT
  monthly,
  weekly,
  monthly / total,
  weekly / total
FROM (
  SELECT
    (SELECT COUNT(*) FROM items WHERE created_at>=... AND user_id=...) AS 'monthly',
    (SELECT COUNT(*) FROM items WHERE created_at>=... AND user_id=...) AS 'weekly',
    ...,
    (SELECT COUNT(*) FROM items WHERE user_id=...) AS total
) s

答案 2 :(得分:0)

  • INDEX(user_id, created_at) - 最佳
  • AND created_at >= '2015-06-30 07:57:56' - 有帮助,因为它减少了要触摸的索引条目的数量
  • 执行UNION无济于事,因为它会带来4倍的工作量。
  • 为子查询SELECTs做的事情没有用同样的原因。

另外

COUNT(IF(created_at >= '2015-07-29 17:03:44', 1, null))

可以缩短为

SUM(created_at >= '2015-07-29 17:03:44')

(但可能不会加快速度)

如果数据没有随时间变化,只会添加新行,那么过去数据的汇总表会带来显着的加速,但前提是你可以避免'07:57:56'这样的'30day' 。 (为什么'00:00:00'只针对其中一些?)也许加速将是其他变化的另一个因素10。想进一步讨论?

(我认为使用PARTITION没有任何优势。)