需要有关优化有趣的MySQL查询的帮助

时间:2017-08-09 19:51:23

标签: mysql sql database-performance query-performance sqlperformance

查询优化

我需要有关优化此查询性能的帮助。此查询基本上查找与条件列表匹配的所有句点的累积总和。

目前,此查询运行大约需要100秒,因为它按数据库中的每个帐户进行分组。我试着通过查看解释输出来优化它,但我找不到让它工作的方法。这是解释输出:

explain query output

创意时间为10秒或更短。期待你的回复。谢谢!

SET @date = '2017-05-17';
SET @offset = 1;

select 
b.act,
CASE 
WHEN b.jdt <= DATE_SUB(@date, INTERVAL 5 DAY) AND b.jdt >= DATE_SUB(@date, INTERVAL 5 + @offset DAY) AND DATEDIFF(a.dt,b.jdt) <=5 THEN 5
WHEN b.jdt <= DATE_SUB(@date, INTERVAL 13 DAY) AND b.jdt >= DATE_SUB(@date, INTERVAL 13 + @offset DAY) AND DATEDIFF(a.dt,b.jdt) <=13 THEN 13
WHEN b.jdt <= DATE_SUB(@date, INTERVAL 25 DAY) AND b.jdt >= DATE_SUB(@date, INTERVAL 25 + @offset DAY) AND DATEDIFF(a.dt,b.jdt) <=25 THEN 25
WHEN b.jdt <= DATE_SUB(@date, INTERVAL 45 DAY) AND b.jdt >= DATE_SUB(@date, INTERVAL 45 + @offset DAY) AND DATEDIFF(a.dt,b.jdt) <=45 THEN 45
WHEN b.jdt <= DATE_SUB(@date, INTERVAL 75 DAY) AND b.jdt >= DATE_SUB(@date, INTERVAL 75 + @offset DAY) AND DATEDIFF(a.dt,b.jdt) <=75 THEN 75
WHEN b.jdt <= DATE_SUB(@date, INTERVAL 105 DAY) AND b.jdt >= DATE_SUB(@date, INTERVAL 105 + @offset DAY) AND DATEDIFF(a.dt,b.jdt) <=105 THEN 105
WHEN b.jdt <= DATE_SUB(@date, INTERVAL 135 DAY) AND b.jdt >= DATE_SUB(@date, INTERVAL 135 + @offset DAY) AND DATEDIFF(a.dt,b.jdt) <=135 THEN 135
WHEN b.jdt <= DATE_SUB(@date, INTERVAL 165 DAY) AND b.jdt >= DATE_SUB(@date, INTERVAL 165 + @offset DAY) AND DATEDIFF(a.dt,b.jdt) <=165 THEN 165
WHEN b.jdt <= DATE_SUB(@date, INTERVAL 195 DAY) AND b.jdt >= DATE_SUB(@date, INTERVAL 195 + @offset DAY) AND DATEDIFF(a.dt,b.jdt) <=195 THEN 195
WHEN b.jdt <= DATE_SUB(@date, INTERVAL 225 DAY) AND b.jdt >= DATE_SUB(@date, INTERVAL 225 + @offset DAY) AND DATEDIFF(a.dt,b.jdt) <=225 THEN 225
ELSE 'other' END AS 'period',

SUM(CASE WHEN a.type = 'JN' AND a.paid = 'Y' AND a.upgraded=0 THEN 1 ELSE 0 END) AS 'Paid_Joins',
SUM(CASE WHEN a.type IN ('SL','RL') AND ttype !='Purchase' THEN (a.amt_usd/100 - a.vat_usd/100) END) AS 'Revenue_Amount'

FROM __customer b
JOIN  __transaction a on b.uid = a.primary_uid 

WHERE
b.affiliate_act regexp '^[a-zA-Z]+[0-9]+'
AND a.dt <= @date 
AND a.dt >= DATE_SUB(@date, INTERVAL 225 + @offset DAY)
AND b.jdt >= DATE_SUB(@date, INTERVAL 225 + @offset DAY)
GROUP BY 1,2
HAVING period != 'other'

更新

表格结构:

enter image description here enter image description here

UPDATE2

我在没有客户表连接的情况下使用相同的查询逻辑在事务表上查询,看起来它仍然在扫描与连接相同的行。由于它可以查看数据库中的每个组合,因此我无法考虑添加更有效的where子句来限制扫描的行数。

SET @date = '2017-05-17';
SET @offset = 2;
SET @start = DATE_SUB(@date, INTERVAL 225 + @offset DAY);

explain
select 
    a.account,        
    SUM(CASE WHEN a.type = 'JN' AND a.paid = 'Y' AND a.upgraded=0 
             THEN 1 
             ELSE 0 
        END) AS 'Paid_Joins'       
FROM __transaction a        
WHERE a.account regexp '^[a-zA-Z]+[0-9]+'
  AND a.dt <= @date 
  AND a.dt >= @start
-- AND b.affiliate_act = 'el4557'
GROUP BY 1

enter image description here

此处扫描的行数与连接时的行数相同。

enter image description here

2 个答案:

答案 0 :(得分:0)

确保将a.primary_uid,a.dt和b.uid编入索引。尝试复合a.dt,a.primary_uid。

处理了5,187,819行,DATE_SUB可能被称为103,756,380次,具体取决于优化程序如何解释代码。这就是为什么我建议将它从查询中拉出来,它将被调用十次。

尝试预先计算日期间隔,以便不为case语句的每次迭代计算它们。您还可以预先计算日期间隔减去偏移量。如果这有帮助,我会把它留给你。

SET @date = '2017-05-17';
SET @offset = 1;
SET @dtint5 = DATE_SUB(@date, INTERVAL 5 DAY);
SET @dtint13 = DATE_SUB(@date, INTERVAL 13 DAY);
SET @dtint25 = DATE_SUB(@date, INTERVAL 25 DAY);
SET @dtint45 = DATE_SUB(@date, INTERVAL 45 DAY);
SET @dtint75 = DATE_SUB(@date, INTERVAL 75 DAY);
SET @dtint105 = DATE_SUB(@date, INTERVAL 105 DAY);
SET @dtint135 = DATE_SUB(@date, INTERVAL 135 DAY);
SET @dtint165 = DATE_SUB(@date, INTERVAL 165 DAY);
SET @dtint195 = DATE_SUB(@date, INTERVAL 195 DAY);
SET @dtint225 = DATE_SUB(@date, INTERVAL 225 DAY);

select 
b.act,
CASE 
WHEN b.jdt <= @dtin5 AND b.jdt >= @dtint5 - @offset AND DATEDIFF(a.dt,b.jdt) <=5 THEN 5
WHEN b.jdt <= @dtint13 AND b.jdt >= @dtint13 - @offset AND DATEDIFF(a.dt,b.jdt) <=13 THEN 13
WHEN b.jdt <= @dtint25 AND b.jdt >= @dtint25 - @offset AND DATEDIFF(a.dt,b.jdt) <=25 THEN 25
WHEN b.jdt <= @dtint45 AND b.jdt >= @dtint45 - @offset AND DATEDIFF(a.dt,b.jdt) <=45 THEN 45
WHEN b.jdt <= @dtint75 AND b.jdt >= @dtint75 - @offset AND DATEDIFF(a.dt,b.jdt) <=75 THEN 75
WHEN b.jdt <= @dtint105 AND b.jdt >= @dtint105 - @offset AND DATEDIFF(a.dt,b.jdt) <=105 THEN 105
WHEN b.jdt <= @dtint135 AND b.jdt >= @dtint135 - @offset AND DATEDIFF(a.dt,b.jdt) <=135 THEN 135
WHEN b.jdt <= @dtint165 AND b.jdt >= @dtint165 - @offset AND DATEDIFF(a.dt,b.jdt) <=165 THEN 165
WHEN b.jdt <= @dtint195 AND b.jdt >= @dtint195 - @offset AND DATEDIFF(a.dt,b.jdt) <=195 THEN 195
WHEN b.jdt <= @dtint225 AND b.jdt >= @dtint225 - @offset AND DATEDIFF(a.dt,b.jdt) <=225 THEN 225
ELSE 'other' 
END AS 'period',

SUM(CASE WHEN a.type = 'JN' AND a.paid = 'Y' AND a.upgraded=0 THEN 1 ELSE 0     END) AS 'Paid_Joins',
SUM(CASE WHEN a.type IN ('SL','RL') AND ttype !='Purchase' THEN (a.amt_usd/100 - a.vat_usd/100) END) AS 'Revenue_Amount'

FROM __customer b
JOIN  __transaction a on b.uid = a.primary_uid 

WHERE a.dt <= @date

GROUP BY 1,2

答案 1 :(得分:0)

您要做的就是将帐户活动分解为客户历史交易的存储区范围。但是,查看日期测试,看起来每个存储桶基本上都是2天,例如

5 days results in 5/11-5/12
13 days results in 5/3-5/4
25 days results in 4/21-4/22

而是一整桶ex:

5/11 - 5/17
5/2  - 5/10
4/21 - 5/2
??? - 4/20

如果我使用您的日期/间隔设置运行一个简单的

SELECT DATE_SUB(@date, INTERVAL 5 DAY) - @offset   (result 20170511 looking like a number, not a date)
and
SELECT DATE_SUB(@date, INTERVAL 5 DAY) (result 2017-05-12 expected date)

因此,对于您的范围,将代表2017-05-11&lt; = jdt AND jdt&lt; = 2017-05-12,仅涵盖两天。我只能假设您希望调整基于您的@offset值以更好地跨越时间。这似乎是一种非常尴尬的查询类型。

如果您正在寻找一天的活动,您实际上可能意味着要执行以下操作

SELECT DATE_SUB(@date, INTERVAL 5+@offset DAY) (result 2017-05-11  expected date)
and
SELECT DATE_SUB(@date, INTERVAL 5 DAY) (result 2017-05-12 expected date)

为了进行日期范围比较的基于日期/时间的列,我通常会执行&gt; =开始日期,而不是NEXT日的问题,所以每天每小时/分钟/秒被捕获

'2017-05-11' >= jdt AND jdt < '2017-05-12'

这可以让你在5月11日到晚上11:59:59之间获得一切,但不包括5月12日。如果您打算按照我的描述实际操作铲斗范围,我会为您提供更清洁的解决方案,请告诉我。

此外,为了您的表现,您正在查看给定日期之前的所有交易,因此您几乎可以浏览整个交易表。还有什么&#34;类型&#34;事务是否在表中,可以通过索引帮助丢弃?看来你只关心JN&#39; SL&#39; SL&#39;和&#39; RL&#39;。也就是说,我的事务表上会有一个索引(type,dt)。对于您的客户表,我会在(uid,act)上设置覆盖索引,因此不需要查看客户的所有页面数据来检索他们的帐户。它根据UID限定了联接,并且帐号#出现了。