Question

我正在尝试使用通过变量传递日期引用的子查询来完成需要计算列的查询。我不确定我是不是“做得对”，但基本上查询永远不会完成并且旋转几分钟。这是我的问题：

select @groupdate:=date_format(order_date,'%Y-%m'), count(distinct customer_email) as num_cust,
(
  select count(distinct cev.customer_email) as num_prev
  from _pj_cust_email_view cev
  inner join _pj_cust_email_view as prev_purch on (prev_purch.order_date < @groupdate) and (cev.customer_email=prev_purch.customer_email)
  where cev.order_date > @groupdate
) as prev_cust_count
from _pj_cust_email_view
group by @groupdate;

子查询有一个inner join来完成自我加入，这只会给我在@groupdate日期之前购买的人数。 EXPLAIN位于以下位置：

+----+----------------------+---------------------+------+---------------+-----------+---------+---------------------------+--------+---------------------------------+
| id | select_type          | table               | type | possible_keys | key       | key_len | ref                       | rows   | Extra                           |
+----+----------------------+---------------------+------+---------------+-----------+---------+---------------------------+--------+---------------------------------+
|  1 | PRIMARY              | _pj_cust_email_view | ALL  | NULL          | NULL      | NULL    | NULL                      | 140147 | Using temporary; Using filesort |
|  2 | UNCACHEABLE SUBQUERY | cev                 | ALL  | IDX_EMAIL     | NULL      | NULL    | NULL                      | 140147 | Using where                     |
|  2 | UNCACHEABLE SUBQUERY | prev_purch          | ref  | IDX_EMAIL     | IDX_EMAIL | 768     | cart_A.cev.customer_email |      1 | Using where                     |
+----+----------------------+---------------------+------+---------------+-----------+---------+---------------------------+--------+---------------------------------+

表格_pj_cust_email_view的结构如下：

'_pj_cust_email_view', 'CREATE TABLE `_pj_cust_email_view` (
  `order_date` varchar(10) CHARACTER SET utf8 DEFAULT NULL,
  `customer_email` varchar(255) CHARACTER SET utf8 DEFAULT NULL,
  KEY `IDX_EMAIL` (`customer_email`),
  KEY `IDX_ORDERDATE` (`order_date`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1'

同样，正如我之前所说，我不确定这是实现这一目标的最佳方法。任何批评，指导都表示赞赏！

更新

我已经取得了一些进展，现在我正在通过在数据库中迭代所有已知月份而不是数月并且提前设置变量来进行上述程序。我还是不喜欢这个。这就是我现在所拥有的：

设置用户定义的变量

set @startdate:='2010-08', @enddate:='2010-09';

获取给定范围内的完整不同电子邮件

select count(distinct customer_email) as num_cust
from _pj_cust_email_view
where order_date between @startdate and @enddate;

获取在给定范围之前购买的客户总数

select count(distinct cev.customer_email) as num_prev
  from _pj_cust_email_view cev
  inner join _pj_cust_email_view as prev_purch on (prev_purch.order_date < @startdate) and (cev.customer_email=prev_purch.customer_email)
  where cev.order_date between @startdate and @enddate;

@startdate设置为月初，@enddate表示该月的范围结束。

我真的觉得这仍然可以在一个完整的查询中完成。

Answer 1

我认为你根本不需要使用子查询，也不需要迭代几个月。

相反，我建议您创建一个表来存储所有月份。即使你用100年的时间预先填充它，它也只有1200行，这是微不足道的。

CREATE TABLE Months (
    start_date DATE, 
    end_date DATE, 
    PRIMARY KEY (start_date, end_date)
);
INSERT INTO Months (start_date, end_date) 
VALUES ('2011-03-01', '2011-03-31');

存储实际的开始和结束日期，这样您就可以使用DATE数据类型并正确地索引两列。

编辑：我想我更了解你的要求，我已经清理了这个答案。以下查询可能适合您：

SELECT DATE_FORMAT(m.start_date, '%Y-%m') AS month,
  COUNT(DISTINCT cev.customer_email) AS current,
  GROUP_CONCAT(DISTINCT cev.customer_email) AS current_email,
  COUNT(DISTINCT prev.customer_email) AS earlier,
  GROUP_CONCAT(DISTINCT prev.customer_email) AS earlier_email
FROM Months AS m 
LEFT OUTER JOIN _pj_cust_email_view AS cev
  ON cev.order_date BETWEEN m.start_date AND m.end_date
INNER JOIN Months AS mprev
  ON mprev.start_date <= m.start_date
LEFT OUTER JOIN _pj_cust_email_view AS prev
  ON prev.order_date BETWEEN mprev.start_date AND mprev.end_date
GROUP BY month;

如果在表格中创建以下复合索引：

CREATE INDEX order_email on _pj_cust_email_view (order_date, customer_email);

然后查询最有可能成为仅索引查询，并且运行速度会快很多。

以下是此查询的EXPLAIN优化报告。请注意每个表的type: index。

*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: m
         type: index
possible_keys: PRIMARY
          key: PRIMARY
      key_len: 6
          ref: NULL
         rows: 4
        Extra: Using index; Using temporary; Using filesort
*************************** 2. row ***************************
           id: 1
  select_type: SIMPLE
        table: mprev
         type: index
possible_keys: PRIMARY
          key: PRIMARY
      key_len: 6
          ref: NULL
         rows: 4
        Extra: Using where; Using index; Using join buffer
*************************** 3. row ***************************
           id: 1
  select_type: SIMPLE
        table: cev
         type: index
possible_keys: order_email
          key: order_email
      key_len: 17
          ref: NULL
         rows: 10
        Extra: Using index
*************************** 4. row ***************************
           id: 1
  select_type: SIMPLE
        table: prev
         type: index
possible_keys: order_email
          key: order_email
      key_len: 17
          ref: NULL
         rows: 10
        Extra: Using index

以下是一些测试数据：

INSERT INTO Months (start_date, end_date) VALUES
('2011-03-01', '2011-03-31'),
('2011-02-01', '2011-02-28'),
('2011-01-01', '2011-01-31'),
('2010-12-01', '2010-12-31');

INSERT INTO _pj_cust_email_view VALUES
('ron', '2011-03-10'),
('hermione', '2011-03-15'),
('hermione', '2011-02-15'),
('hermione', '2011-01-15'),
('hermione', '2010-12-15'),
('neville', '2011-01-10'),
('harry', '2011-03-19'),
('harry', '2011-02-10'),
('molly', '2011-03-25'),
('molly', '2011-01-10');

这是给出数据的结果，包括连续的电子邮件列表，以便于查看。

+---------+---------+--------------------------+---------+----------------------------------+
| month   | current | current_email            | earlier | earlier_email                    |
+---------+---------+--------------------------+---------+----------------------------------+
| 2010-12 |       1 | hermione                 |       1 | hermione                         | 
| 2011-01 |       3 | neville,hermione,molly   |       3 | hermione,molly,neville           | 
| 2011-02 |       2 | hermione,harry           |       4 | harry,hermione,molly,neville     | 
| 2011-03 |       4 | molly,ron,harry,hermione |       5 | molly,ron,hermione,neville,harry | 
+---------+---------+--------------------------+---------+----------------------------------+

Answer 2

尽管Bill使用多个表进行了很好的查询，但是这个表也使用了SQL变量，所以没有额外的表。内部查询连接到您的_pj_cust_email_view表并执行限制10以表示仅从当前月份返回10个月。因此，没有日期的硬编码，它是即时计算的...如果您想要更多或更少的月份，只需更改LIMIT子句。

通过将@dt：=设置为内部查询中的最后一个字段，只有这样才能为下一个记录周期分配日期基础以创建符合条件的日期......

select justDates.FirstOfMonth,
       count( distinct EMCurr.customer_Email ) UniqThisMonth,
       count( distinct EMLast.customer_Email ) RepeatCustomers
   from 
      ( SELECT 
                 @dt FirstOfMonth,
                 last_day( @dt ) EndOfMonth,
                 @dt:= date_sub(@dt, interval 1 month) nextCycle
            FROM 
                 (select @dt := date_sub( current_date(), interval dayofmonth( current_date())-1 day )) vars,
                _pj_cust_email_view limit 10 
                ) JustDates
        join _pj_cust_email_view EMCurr
           on EMCurr.order_Date between JustDates.FirstOfMonth and JustDates.EndOfMonth
        left join _pj_cust_email_view EMLast
           on EMLast.order_Date < JustDates.FirstOfMonth
           and EMCurr.customer_Email = EMLast.customer_Email
    group by 
       1

具有用户定义变量的MySQL子查询

2 个答案: