优化用于分析的大型MySQL聚合查询

时间:2011-05-10 23:23:31

标签: mysql

我正在尝试构建一些基本的营销分析工具,并希望在第N天和第34天提供一个"交易。每个广告系列代码的摘要。

有没有办法让这样的查询更有效率?对于每个day_n列,我想要计算当天或当天所做的所有交易。

SELECT 
c.campaign_code, 
(select count(*) from _t_transactions where campaign_code=c.campaign_code and day <= 1) as day_1,
(select count(*) from _t_transactions where campaign_code=c.campaign_code and day <= 2) as day_2,
(select count(*) from _t_transactions where campaign_code=c.campaign_code and day <= 3) as day_3,
(select count(*) from _t_transactions where campaign_code=c.campaign_code and day <= 4) as day_4,
(select count(*) from _t_transactions where campaign_code=c.campaign_code and day <= 5) as day_5,
(select count(*) from _t_transactions where campaign_code=c.campaign_code and day <= 6) as day_6,
(select count(*) from _t_transactions where campaign_code=c.campaign_code and day <= 7) as day_7,
(select count(*) from _t_transactions where campaign_code=c.campaign_code and day <= 14) as day_14,
(select count(*) from _t_transactions where campaign_code=c.campaign_code and day <= 30) as day_30,
(select count(*) from _t_transactions where campaign_code=c.campaign_code and day <= 60) as day_60,
(select count(*) from _t_transactions where campaign_code=c.campaign_code and day <= 90) as day_90
FROM campaigns c LEFT JOIN _t_transactions t ON c.campaign_code=t.campaign_code

表结构是......

CREATE TEMPORARY TABLE `campaigns` (
`campaign_code` varchar(255) DEFAULT NULL,
);

CREATE TABLE `_t_transactions` (
`id` int(11) DEFAULT NULL,
`campaign_code` varchar(255) DEFAULT NULL,
`day` int(11) DEFAULT NULL
);

2 个答案:

答案 0 :(得分:1)

所以问题不具代表性 - 我真正需要完成的是N天后的ROI计算。结果证明@tandu建议完美的条件,并在0.5秒而不是33秒内返回(对于我相当小的数据集)。

当然,正如tandu所指出的那样,这对MySQL来说并不是一个真正的工作。

这是我最终的结果:

SELECT 
c.campaign_code, c.clicks, c.cpc, c.spent,
sum(IF(day <= 1, amount, 0)) / c.spent as day_1,
sum(IF(day <= 2, amount, 0)) / c.spent as day_2,
sum(IF(day <= 3, amount, 0)) / c.spent as day_3,
sum(IF(day <= 4, amount, 0)) / c.spent as day_4,
sum(IF(day <= 5, amount, 0)) / c.spent as day_5,
sum(IF(day <= 6, amount, 0)) / c.spent as day_6,
sum(IF(day <= 7, amount, 0)) / c.spent as day_7,
sum(IF(day <= 14, amount, 0)) / c.spent as day_14,
sum(IF(day <= 30, amount, 0)) / c.spent as day_30,
sum(IF(day <= 60, amount, 0)) / c.spent as day_60,
sum(IF(day <= 90, amount, 0)) / c.spent as day_90
FROM _t_transactions t LEFT JOIN campaigns c ON c.campaign_code=t.campaign_code 
GROUP BY t.campaign_code;

没有意识到条件可以用在聚合函数中......在这种情况下真的很有用。这也解决了我想要为此使用临时表但无法在同一查询中多次引用相同临时表的相关问题。

答案 1 :(得分:0)

我不是专家,MySQL查询优化器可能会为您处理这些问题,但这里有一些提示:

  1. 如果您没有GROUP BY campaign_code,那么每次交易都不会获得一行吗?也许我错了。
  2. 您似乎只对日期&lt; = 90的交易感兴趣。将其添加到WHERE子句中可显着缩短结果的时间。
  3. 使用索引的代理整数会更快,而不是在事务campaign_code中搜索。
  4. 重组你的表格:

    CREATE TEMPORARY TABLE campaigns (
       campaign_id int unsigned not null auto_increment primary key,
       campaign_code varchar(255)
    );
    
    CREATE TABLE `_t_transaction` (
       `id` int(11),
       `campaign_id` int unsigned not null,
       key (campaign_id)
       foreign key (campaign_id) references campaigns (campaign_id),
       `day` int(11)
    );
    

    使用获取所需信息的派生表也可能更快:

    SELECT
       campaign_code,
       SUM(IF(day <= 1, 1, 0)) as day_1,
       SUM(IF(day <= 2, 1, 0)) as day_2,
       -- ...
    FROM
       campaigns
       NATURAL JOIN _t_transactions
       NATURAL JOIN (
          SELECT
             id
          FROM
             _t_transactions
          WHERE
             day <= 90
       ) derived
    GROUP BY
       campaign_code
    

    这将检查明显更少的行,如果初始查询运行时间很长,可以节省一些时间。或者不是。

    最后的建议是不要使用MySQL进行财务分析存储和处理,因为它并不是真的为此而设计的。您可以看到执行枢轴有多难。切换到为此类事物设计的不同RDBMS,或将其委托给您选择的脚本语言。