如何优化此MySQL查询?数百万行

时间:2018-06-21 17:54:11

标签: mysql sql query-optimization amazon-rds sql-optimization

我有以下查询:

SELECT 
    analytics.source AS referrer, 
    COUNT(analytics.id) AS frequency, 
    SUM(IF(transactions.status = 'COMPLETED', 1, 0)) AS sales
FROM analytics
LEFT JOIN transactions ON analytics.id = transactions.analytics
WHERE analytics.user_id = 52094 
GROUP BY analytics.source 
ORDER BY frequency DESC 
LIMIT 10 

分析表有6000万行,交易表有300万行。

在此查询上运行EXPLAIN时,我得到:

+------+--------------+-----------------+--------+---------------------+-------------------+----------------------+---------------------------+----------+-----------+-------------------------------------------------+
| # id |  select_type |      table      |  type  |    possible_keys    |        key        |        key_len       |            ref            |   rows   |   Extra   |                                                 |
+------+--------------+-----------------+--------+---------------------+-------------------+----------------------+---------------------------+----------+-----------+-------------------------------------------------+
| '1'  |  'SIMPLE'    |  'analytics'    |  'ref' |  'analytics_user_id | analytics_source' |  'analytics_user_id' |  '5'                      |  'const' |  '337662' |  'Using where; Using temporary; Using filesort' |
| '1'  |  'SIMPLE'    |  'transactions' |  'ref' |  'tran_analytics'   |  'tran_analytics' |  '5'                 |  'dijishop2.analytics.id' |  '1'     |  NULL     |                                                 |
+------+--------------+-----------------+--------+---------------------+-------------------+----------------------+---------------------------+----------+-----------+-------------------------------------------------+

我不知道如何优化此查询,因为它已经非常基础了。运行此查询大约需要70秒。

以下是存在的索引:

+-------------+-------------+----------------------------+---------------+------------------+------------+--------------+-----------+---------+--------+-------------+----------+----------------+
|   # Table   |  Non_unique |          Key_name          |  Seq_in_index |    Column_name   |  Collation |  Cardinality |  Sub_part |  Packed |  Null  |  Index_type |  Comment |  Index_comment |
+-------------+-------------+----------------------------+---------------+------------------+------------+--------------+-----------+---------+--------+-------------+----------+----------------+
| 'analytics' |  '0'        |  'PRIMARY'                 |  '1'          |  'id'            |  'A'       |  '56934235'  |  NULL     |  NULL   |  ''    |  'BTREE'    |  ''      |  ''            |
| 'analytics' |  '1'        |  'analytics_user_id'       |  '1'          |  'user_id'       |  'A'       |  '130583'    |  NULL     |  NULL   |  'YES' |  'BTREE'    |  ''      |  ''            |
| 'analytics' |  '1'        |  'analytics_product_id'    |  '1'          |  'product_id'    |  'A'       |  '490812'    |  NULL     |  NULL   |  'YES' |  'BTREE'    |  ''      |  ''            |
| 'analytics' |  '1'        |  'analytics_affil_user_id' |  '1'          |  'affil_user_id' |  'A'       |  '55222'     |  NULL     |  NULL   |  'YES' |  'BTREE'    |  ''      |  ''            |
| 'analytics' |  '1'        |  'analytics_source'        |  '1'          |  'source'        |  'A'       |  '24604'     |  NULL     |  NULL   |  'YES' |  'BTREE'    |  ''      |  ''            |
| 'analytics' |  '1'        |  'analytics_country_name'  |  '1'          |  'country_name'  |  'A'       |  '39510'     |  NULL     |  NULL   |  'YES' |  'BTREE'    |  ''      |  ''            |
| 'analytics' |  '1'        |  'analytics_gordon'        |  '1'          |  'id'            |  'A'       |  '56934235'  |  NULL     |  NULL   |  ''    |  'BTREE'    |  ''      |  ''            |
| 'analytics' |  '1'        |  'analytics_gordon'        |  '2'          |  'user_id'       |  'A'       |  '56934235'  |  NULL     |  NULL   |  'YES' |  'BTREE'    |  ''      |  ''            |
| 'analytics' |  '1'        |  'analytics_gordon'        |  '3'          |  'source'        |  'A'       |  '56934235'  |  NULL     |  NULL   |  'YES' |  'BTREE'    |  ''      |  ''            |
+-------------+-------------+----------------------------+---------------+------------------+------------+--------------+-----------+---------+--------+-------------+----------+----------------+


+----------------+-------------+-------------------+---------------+-------------------+------------+--------------+-----------+---------+--------+-------------+----------+----------------+
|    # Table     |  Non_unique |      Key_name     |  Seq_in_index |    Column_name    |  Collation |  Cardinality |  Sub_part |  Packed |  Null  |  Index_type |  Comment |  Index_comment |
+----------------+-------------+-------------------+---------------+-------------------+------------+--------------+-----------+---------+--------+-------------+----------+----------------+
| 'transactions' |  '0'        |  'PRIMARY'        |  '1'          |  'id'             |  'A'       |  '2436151'   |  NULL     |  NULL   |  ''    |  'BTREE'    |  ''      |  ''            |
| 'transactions' |  '1'        |  'tran_user_id'   |  '1'          |  'user_id'        |  'A'       |  '56654'     |  NULL     |  NULL   |  ''    |  'BTREE'    |  ''      |  ''            |
| 'transactions' |  '1'        |  'transaction_id' |  '1'          |  'transaction_id' |  'A'       |  '2436151'   |  '191'    |  NULL   |  'YES' |  'BTREE'    |  ''      |  ''            |
| 'transactions' |  '1'        |  'tran_analytics' |  '1'          |  'analytics'      |  'A'       |  '2436151'   |  NULL     |  NULL   |  'YES' |  'BTREE'    |  ''      |  ''            |
| 'transactions' |  '1'        |  'tran_status'    |  '1'          |  'status'         |  'A'       |  '22'        |  NULL     |  NULL   |  'YES' |  'BTREE'    |  ''      |  ''            |
| 'transactions' |  '1'        |  'gordon_trans'   |  '1'          |  'status'         |  'A'       |  '22'        |  NULL     |  NULL   |  'YES' |  'BTREE'    |  ''      |  ''            |
| 'transactions' |  '1'        |  'gordon_trans'   |  '2'          |  'analytics'      |  'A'       |  '2436151'   |  NULL     |  NULL   |  'YES' |  'BTREE'    |  ''      |  ''            |
+----------------+-------------+-------------------+---------------+-------------------+------------+--------------+-----------+---------+--------+-------------+----------+----------------+

在按照建议的方式添加任何额外索引之前,简化了两个表的架构,因为这并不能改善情况。

CREATE TABLE `analytics` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `user_id` int(11) DEFAULT NULL,
  `affil_user_id` int(11) DEFAULT NULL,
  `product_id` int(11) DEFAULT NULL,
  `medium` varchar(45) COLLATE utf8mb4_unicode_ci DEFAULT NULL,
  `source` varchar(45) COLLATE utf8mb4_unicode_ci DEFAULT NULL,
  `terms` varchar(1024) COLLATE utf8mb4_unicode_ci DEFAULT NULL,
  `is_browser` tinyint(1) DEFAULT NULL,
  `is_mobile` tinyint(1) DEFAULT NULL,
  `is_robot` tinyint(1) DEFAULT NULL,
  `browser` varchar(45) COLLATE utf8mb4_unicode_ci DEFAULT NULL,
  `mobile` varchar(45) COLLATE utf8mb4_unicode_ci DEFAULT NULL,
  `robot` varchar(45) COLLATE utf8mb4_unicode_ci DEFAULT NULL,
  `platform` varchar(45) COLLATE utf8mb4_unicode_ci DEFAULT NULL,
  `referrer` varchar(255) COLLATE utf8mb4_unicode_ci DEFAULT NULL,
  `domain` varchar(45) COLLATE utf8mb4_unicode_ci DEFAULT NULL,
  `ip` varchar(255) COLLATE utf8mb4_unicode_ci DEFAULT NULL,
  `continent_code` varchar(10) COLLATE utf8mb4_unicode_ci DEFAULT NULL,
  `country_name` varchar(100) COLLATE utf8mb4_unicode_ci DEFAULT NULL,
  `city` varchar(100) COLLATE utf8mb4_unicode_ci DEFAULT NULL,
  `date` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
  PRIMARY KEY (`id`),
  KEY `analytics_user_id` (`user_id`),
  KEY `analytics_product_id` (`product_id`),
  KEY `analytics_affil_user_id` (`affil_user_id`)
) ENGINE=InnoDB AUTO_INCREMENT=64821325 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci;

CREATE TABLE `transactions` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `transaction_id` varchar(255) COLLATE utf8mb4_unicode_ci DEFAULT NULL,
  `user_id` int(11) NOT NULL,
  `pay_key` varchar(50) COLLATE utf8mb4_unicode_ci DEFAULT NULL,
  `sender_email` varchar(255) COLLATE utf8mb4_unicode_ci DEFAULT NULL,
  `amount` decimal(10,2) DEFAULT NULL,
  `currency` varchar(10) COLLATE utf8mb4_unicode_ci DEFAULT NULL,
  `status` varchar(50) COLLATE utf8mb4_unicode_ci DEFAULT NULL,
  `analytics` int(11) DEFAULT NULL,
  `ip_address` varchar(46) COLLATE utf8mb4_unicode_ci DEFAULT NULL,
  `session_id` varchar(60) COLLATE utf8mb4_unicode_ci DEFAULT NULL,
  `date` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
  `eu_vat_applied` int(1) DEFAULT '0',
  PRIMARY KEY (`id`),
  KEY `tran_user_id` (`user_id`),
  KEY `transaction_id` (`transaction_id`(191)),
  KEY `tran_analytics` (`analytics`),
  KEY `tran_status` (`status`)
) ENGINE=InnoDB AUTO_INCREMENT=10019356 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci;

如果以上无法进一步优化。关于汇总表的任何实施建议都将非常有用。我们正在AWS上使用LAMP堆栈。上面的查询正在RDS(m1.large)上运行。

13 个答案:

答案 0 :(得分:10)

我将创建以下索引(b树索引):

analytics(user_id, source, id) 
transactions(analytics, status)

这与戈登的建议不同。

索引中列的顺序很重要。

您按特定的analytics.user_id进行过滤,因此该字段必须是索引中的第一个字段。 然后,您按analytics.source分组。为了避免按source进行排序,这应该是索引的下一个字段。您还引用了analytics.id,因此最好将此字段作为索引的一部分,放在最后。 MySQL是否能够只读取索引而不接触表?我不知道,但是测试起来很容易。

transactions上的

索引必须以analytics开头,因为它将在JOIN中使用。我们还需要status

SELECT 
    analytics.source AS referrer, 
    COUNT(analytics.id) AS frequency, 
    SUM(IF(transactions.status = 'COMPLETED', 1, 0)) AS sales
FROM analytics
LEFT JOIN transactions ON analytics.id = transactions.analytics
WHERE analytics.user_id = 52094 
GROUP BY analytics.source 
ORDER BY frequency DESC 
LIMIT 10 

答案 1 :(得分:7)

首先分析...

SELECT  a.source AS referrer,
        COUNT(*) AS frequency,  -- See question below
        SUM(t.status = 'COMPLETED') AS sales
    FROM  analytics AS a
    LEFT JOIN  transactions AS t  ON a.id = t.analytics AS a
    WHERE  a.user_id = 52094
    GROUP BY  a.source
    ORDER BY  frequency DESC
    LIMIT  10 

如果从at的映射是“一对多”,则需要考虑COUNTSUM是否具有正确的值或虚增值。如查询所示,它们是“膨胀的”。 JOIN发生在聚合之前 ,因此您要计算事务的数量和完成的数量。我认为这是理想的。

注意:通常的模式是COUNT(*);说COUNT(x)意味着检查x是否为NULL。我怀疑不需要检查吗?

此索引处理WHERE,并且正在“覆盖”:

 analytics:  INDEX(user_id, source, id)   -- user_id first

 transactions:  INDEX(analytics, status)  -- in this order

GROUP BY可能需要也可能不需要“排序”。 ORDER BYGROUP BY不同,肯定需要进行排序。整个分组的行将需要排序; LIMIT没有捷径。

通常,摘要表面向日期。也就是说,PRIMARY KEY包含“日期”和其他一些维度。也许,按日期和user_id键入键会有意义吗?一般用户每天有几笔交易?如果至少为10,则考虑一个汇总表。另外,重要的是不要成为UPDATEingDELETEing的旧记录。 More

我可能会

user_id ...,
source ...,
dy DATE ...,
status ...,
freq      MEDIUMINT UNSIGNED NOT NULL,
status_ct MEDIUMINT UNSIGNED NOT NULL,
PRIMARY KEY(user_id, status, source, dy)

然后查询变为

SELECT  source AS referrer,
        SUM(freq) AS frequency,
        SUM(status_ct) AS completed_sales
    FROM  Summary
    WHERE  user_id = 52094
      AND  status = 'COMPLETED'
    GROUP BY source
    ORDER BY  frequency DESC
    LIMIT  10 

速度来自许多因素

  • 较小的表格(需要查看的行较少)
  • JOIN
  • 更有用的索引

(它仍然需要额外的排序。)

即使没有摘要表,也可能会加快速度...

  • 桌子多大? innodb_buffer_pool_size有多大?
  • Normalizing一些既庞大又重复的字符串可能会使该表不受I / O约束。
  • 这太可怕了:KEY (transaction_id(191));请参见here,了解5种解决方法。
  • IP地址不需要255个字节,也不需要utf8mb4_unicode_ci。 (39)和ascii就足够了。

答案 2 :(得分:6)

对于此查询:

SELECT a.source AS referrer, 
       COUNT(*) AS frequency, 
       SUM( t.status = 'COMPLETED' ) AS sales
FROM analytics a LEFT JOIN
     transactions t
     ON a.id = t.analytics
WHERE a.user_id = 52094 
GROUP BY a.source 
ORDER BY frequency DESC 
LIMIT 10 ;

您要在analytics(user_id, id, source)transactions(analytics, status)上建立索引。

答案 3 :(得分:4)

尝试以下方法,让我知道是否有帮助。

SELECT 
    analytics.source AS referrer, 
    COUNT(analytics.id) AS frequency, 
    SUM(IF(transactions.status = 'COMPLETED', 1, 0)) AS sales
FROM (SELECT * FROM analytics where user_id = 52094) analytics
LEFT JOIN (SELECT analytics, status from transactions where analytics = 52094) transactions ON analytics.id = transactions.analytics
GROUP BY analytics.source 
ORDER BY frequency DESC 
LIMIT 10

答案 4 :(得分:3)

您能否尝试以下方法:

SELECT 
    analytics.source AS referrer, 
    COUNT(analytics.id) AS frequency, 
    SUM(sales) AS sales
FROM analytics
LEFT JOIN(
	SELECT transactions.Analytics, (CASE WHEN transactions.status = 'COMPLETED' THEN 1 ELSE 0 END) AS sales
	FROM analytics INNER JOIN transactions ON analytics.id = transactions.analytics
) Tra
ON analytics.id = Tra.analytics
WHERE analytics.user_id = 52094 
GROUP BY analytics.source 
ORDER BY frequency DESC 
LIMIT 10 

答案 5 :(得分:3)

此查询可能将数百万个analytics记录与transactions个记录联接在一起,并计算数百万个记录的总和(包括状态检查)。 如果我们可以首先应用LIMIT 10,然后进行联接并计算总和,则可以加快查询速度。 不幸的是,我们需要analytics.id进行连接,在应用GROUP BY之后会丢失。但是也许analytics.source的选择性足以提高查询量。

因此,我的想法是计算频率,对其进行限制,以在子查询中返回analytics.sourcefrequency,并使用该结果过滤主查询中的analytics ,然后对希望减少的记录数进行其余的联接和计算。

最小子查询(注意:无联接,无总和,返回10条记录):

SELECT
    source,
    COUNT(id) AS frequency
FROM analytics
WHERE user_id = 52094
GROUP BY source
ORDER BY frequency DESC 
LIMIT 10

使用上述查询作为子查询x的完整查询:

SELECT
    x.source AS referrer,
    x.frequency,
    SUM(IF(t.status = 'COMPLETED', 1, 0)) AS sales
FROM
    (<subquery here>) x
    INNER JOIN analytics a
       ON x.source = a.source  -- This reduces the number of records
    LEFT JOIN transactions t
       ON a.id = t.analytics
WHERE a.user_id = 52094      -- We could have several users per source
GROUP BY x.source, x.frequency
ORDER BY x.frequency DESC

如果这不能带来预期的性能提升,则可能是由于MySQL以意外的方式应用了联接。如此处"Is there a way to force MySQL execution order?"所述,在这种情况下,您可以将连接替换为STRAIGHT_JOIN

答案 6 :(得分:2)

我会尝试子查询:

SELECT a.source AS referrer, 
       COUNT(*) AS frequency,
       SUM((SELECT COUNT(*) FROM transactions t 
        WHERE a.id = t.analytics AND t.status = 'COMPLETED')) AS sales
FROM analytics a
WHERE a.user_id = 52094 
GROUP BY a.source
ORDER BY frequency DESC 
LIMIT 10; 

Plus的索引与@Gordon的答案完全相同:分析(用户ID,ID,源)和交易(分析,状态)。

答案 7 :(得分:2)

我在您的查询中发现的唯一问题是

GROUP BY analytics.source 
ORDER BY frequency DESC 

因为此查询正在使用临时表进行文件排序。

避免这种情况的一种方法是创建另一个表,例如

CREATE TABLE `analytics_aggr` (
  `source` varchar(45) COLLATE utf8mb4_unicode_ci DEFAULT NULL,
  `frequency` int(10) DEFAULT NULL,
  `sales` int(10) DEFAULT NULL,
  KEY `sales` (`sales`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci;`

使用以下查询将数据插入analytics_aggr

insert into analytics_aggr SELECT 
    analytics.source AS referrer, 
    COUNT(analytics.id) AS frequency, 
    SUM(IF(transactions.status = 'COMPLETED', 1, 0)) AS sales
    FROM analytics
    LEFT JOIN transactions ON analytics.id = transactions.analytics
    WHERE analytics.user_id = 52094 
    GROUP BY analytics.source 
    ORDER BY null 

现在您可以使用

轻松获取数据
select * from analytics_aggr order by sales desc

答案 8 :(得分:2)

尝试一下

SELECT 
    a.source AS referrer, 
    COUNT(a.id) AS frequency, 
    SUM(t.sales) AS sales
FROM (Select id, source From analytics Where user_id = 52094) a
LEFT JOIN (Select analytics, case when status = 'COMPLETED' Then 1 else 0 end as sales 
           From transactions) t ON a.id = t.analytics
GROUP BY a.source 
ORDER BY frequency DESC 
LIMIT 10 

我之所以提出这个建议,是因为您说“它们是大表”,但是此sql仅使用很少的列。在这种情况下,如果我们仅将内联视图与require列一起使用,那将会很好

注意:内存在这里也将起重要作用。因此在确定内联视图之前请先确认内存

答案 9 :(得分:2)

我将尝试从两个表中分离查询。由于您只需要排名前10位的source,因此,我会先获取它们,然后从transactions的{​​{1}}列中进行查询:

sales

如果没有SELECT source as referrer ,frequency ,(select count(*) from transactions t where t.analytics in (select distinct id from analytics where user_id = 52094 and source = by_frequency.source) and status = 'completed' ) as sales from (SELECT analytics.source ,count(*) as frequency from analytics where analytics.user_id = 52094 group by analytics.source order by frequency desc limit 10 ) by_frequency

,它可能也会更快

答案 10 :(得分:2)

我假设谓词user_id = 52094出于说明目的,在应用中,所选的user_id是变量。

我还假定ACID属性在这里不是很重要。

(1)因此,我将使用实用程序表维护两个仅具有必要字段的副本表(这类似于弗拉基米尔在上面建议的索引)。

CREATE TABLE mv_anal (
  `id` int(11) NOT NULL,
  `user_id` int(11) DEFAULT NULL,
  `source` varchar(45),
  PRIMARY KEY (`id`)
);

CREATE TABLE mv_trans (
  `id` int(11) NOT NULL,
  `status` varchar(50) COLLATE utf8mb4_unicode_ci DEFAULT NULL,
  `analytics` int(11) DEFAULT NULL,
  PRIMARY KEY (`id`)
);

CREATE TABLE util (
  last_updated_anal int (11) NOT NULL,
  last_updated_trans int (11) NOT NULL
);

INSERT INTO util (0, 0);

这样做的好处是,我们将读取原始表的相对较小的投影-希望OS级别和DB级别的缓存起作用,并且不是从较慢的辅助存储中读取,而是从较快的RAM中读取。 这是非常巨大的收获。

这是我更新两个表的方式(下面是cron运行的事务):

-- TRANSACTION STARTS -- 

INSERT INTO mv_trans 
SELECT id, IF (status = 'COMPLETE', 1, 0) AS status, analysis 
FROM transactions JOIN util
ON util.last_updated_trans <= transactions.id

UPDATE util
SET last_updated_trans = sub.m
FROM (SELECT MAX (id) AS m FROM mv_trans) sub;

-- TRANSACTION COMMITS -- 

-- similar transaction for mv_anal.

(2)现在,我将解决选择性问题,以减少顺序扫描时间。我将必须在mv_anal的user_id,source和id(按此顺序)上建立b树索引。

注意:可以通过仅在分析表上创建索引来实现上述目的,但是建立这样的索引需要读取具有6000万行的大表。我的方法要求索引建立只能读取非常薄的表。因此,我们可以更频繁地重建btree(以解决偏斜问题,因为该表仅是追加的)。

这就是我确保查询时实现高选择性并解决倾斜的btree问题的方法。

(3)在PostgreSQL中,总是实现WITH子查询。我希望MySQL同样如此。因此,作为优化的最后一英里:

WITH sub_anal AS (
  SELECT user_id, source AS referrer, COUNT (id) AS frequency
  FROM mv_anal
  WHERE user_id = 52094
  GROUP BY user_id, source
  ORDER BY COUNT (id) DESC
  LIMIT 10
)
SELECT sa.referrer, sa.frequency, SUM (status) AS sales
FROM sub_anal AS sa 
JOIN mv_anal anal 
ON sa.referrer = anal.source AND sa.user_id = anal.user_id
JOIN mv_trans AS trans
ON anal.id = trans.analytics

答案 11 :(得分:1)

晚聚会。我认为您需要将一个索引加载到MySQL的缓存中。 NLJ可能会降低性能。这是我的看法:

路径

您的查询很简单。它有两个表,“路径”非常清楚:

  • 优化器应计划首先读取analytics表。
  • 优化器应计划第二次读取transactions表。这是因为您使用的是LEFT OUTER JOIN。对此没有太多讨论。
  • 此外,analytics表有6000万行,最佳路径应尽快对此行进行过滤。

访问权限

清除路径后,您需要确定要使用索引访问还是表访问。两者都有优点和缺点。但是,您想提高SELECT的性能:

  • 您应该选择索引访问。
  • 避免混合访问。因此,您应该不惜一切代价避免任何表访问(读取)操作。翻译:将所有参与的列都放在索引中。

过滤

同样,您想要SELECT的高性能。因此:

  • 您应该在索引级别而不是表级别执行过滤。

行汇总

过滤后,下一步是按GROUP BY analytics.source聚合行。可以通过将source列作为索引的第一列来改善这一点。

路径,访问,过滤和聚合的最佳索引

考虑到以上所有内容,您应该将所有提到的列都包含在索引中。以下索引可以缩短响应时间:

create index ix1_analytics on analytics (user_id, source, id);

create index ix2_transactions on transactions (analytics, status);

这些索引满足上面描述的“路径”,“访问”和“过滤”策略。

索引缓存

最后-这很关键-将二级索引加载到MySQL的内存缓存中。 MySQL正在执行NLJ(嵌套循环连接)-MySQL术语中的“引用”,并且需要随机访问第二个URL,将近200k次。

不幸的是,我不确定如何将索引加载到MySQL的缓存中。可以使用FORCE,如下所示:

SELECT 
    analytics.source AS referrer, 
    COUNT(analytics.id) AS frequency, 
    SUM(IF(transactions.status = 'COMPLETED', 1, 0)) AS sales
FROM analytics
LEFT JOIN transactions FORCE index (ix2_transactions)
  ON analytics.id = transactions.analytics
WHERE analytics.user_id = 52094 
GROUP BY analytics.source 
ORDER BY frequency DESC 
LIMIT 10

确保您有足够的缓存空间。这是一个简短的问题/答案,供您参考:How to figure out if mysql index fits entirely in memory

祝你好运!哦,然后发布结果。

答案 12 :(得分:1)

这个问题肯定受到了很多关注,因此我确信所有明显的解决方案都已尝试过。不过,我在查询中没有看到解决LEFT JOIN的内容。

我注意到LEFT JOIN语句通常会迫使查询计划者进行哈希联接,这对于少量结果来说是快速的,但是对于大量结果来说却非常慢。如@Rick James的回答所述,由于原始查询中的联接位于标识字段analytics.id上,因此这将生成大量结果。哈希联接将产生可怕的性能结果。下面的建议在没有任何架构或处理更改的情况下解决了此问题。

由于聚合是通过analytics.source进行的,因此我将尝试执行一个查询,该查询将按来源和销售频率分别创建聚合,并将左联接推迟到聚合完成之后。这样应该可以最佳地使用索引(通常这是大型数据集的合并联接)。

这是我的建议:

SELECT t1.source AS referrer, t1.frequency, t2.sales
FROM (
  -- Frequency by source
  SELECT a.source, COUNT(a.id) AS frequency
  FROM analytics a
  WHERE a.user_id=52094
  GROUP BY a.source
) t1
LEFT JOIN (
  -- Sales by source
  SELECT a.source,
    SUM(IF(t.status = 'COMPLETED', 1, 0)) AS sales
  FROM analytics a
  JOIN transactions t
  WHERE a.id = t.analytics
    AND t.status = 'COMPLETED'
    AND a.user_id=52094
  GROUP by a.source
) t2
  ON t1.source = t2.source
ORDER BY frequency DESC 
LIMIT 10 

希望这会有所帮助。