按组编号的MySQL编号 - 我是否遇到了错误?

时间:2015-08-24 05:40:53

标签: mysql sql group-by cross-join

我试图在MySQL中编写一些记录(在Ubuntu上为5.5.44-0),按另一列分组(你会在下面看到我的意思)。我正在调整Running Sums for Multiple Categories in MySQL中描述的解决方案,除了我只是编号,而不是求和。

所涉及的表非常庞大,有近100列,所以让我们首先通过创建只包含重要列的派生表来简化演示。抱歉没有共享SQL小提琴,因为它看起来不像是可复制的,除非用大量数据完成,我无法分享:

创建表格:

CREATE TABLE `inquiries_test` (
  `id` int(11) NOT NULL DEFAULT '0',
  `motive` varchar(255) CHARACTER SET utf8 DEFAULT NULL,
  PRIMARY KEY (`inquiry_id`),
  KEY `motive` (`motive`)
);

insert into inquires_test select id, motive from inquiries;

CREATE TABLE `leads_test` (
  `lead_id` int(11) DEFAULT NULL,
  `created_at` datetime DEFAULT NULL,
  `inquiry_id` int(11) DEFAULT NULL,
  KEY `id` (`lead_id`)
);

insert into leads_test select lead_id, created_at, inquiry_id;

CREATE TABLE `lead_inserts` (
  `lead_id` int(11) DEFAULT NULL,
  `created_at` datetime DEFAULT NULL,
  `cnt` int(11) DEFAULT NULL
);

您将注意到来自queries_test和leads_test的数据来自实际的生产表。其重要性将在以后发挥作用。现在填充lead_inserts:

playground>insert into lead_inserts (cnt, created_at, lead_id) 
    -> SELECT @cnt := if(@id = l.lead_id,@cnt,0) + 1 as cnt 
    -> , l.created_at 
    -> , @id := l.lead_id as local_resouce_id
    -> FROM leads_test l join inquiries_test i on (l.inquiry_id=i.id)
    -> CROSS JOIN (select @id := 0, @cnt := 0) as InitVarsAlias 
    -> where i.motive='real' ORDER BY lead_id, created_at;
Query OK, 2172774 rows affected (14.30 sec)
Records: 2172774  Duplicates: 0  Warnings: 0

playground>select * from lead_inserts where lead_id in (117,118);
+---------+---------------------+------+
| lead_id | created_at          | cnt  |
+---------+---------------------+------+
|     117 | 2012-06-23 00:13:09 |    1 |
|     117 | 2014-09-14 04:30:37 |    2 |
|     117 | 2015-01-27 22:34:41 |    3 |
|     117 | 2015-03-19 19:33:51 |    4 |
|     118 | 2014-12-24 17:47:15 |    1 |
|     118 | 2015-01-23 21:30:09 |    2 |
|     118 | 2015-04-07 21:33:43 |    3 |
|     118 | 2015-04-10 17:00:04 |    4 |
|     118 | 2015-05-12 21:59:49 |    5 |
+---------+---------------------+------+

到目前为止一切顺利 - 每次新的lead_id都会重置cnt的值“重置”。现在假设leads_test和queries_tests基本上是引导并且删除了其他列的查询,那么期望如果我修改insert语句以使用原始表,结果应该是相同的,对吧?但是看看:

playground>truncate table lead_inserts;
Query OK, 0 rows affected (0.14 sec)

playground>insert into lead_inserts (cnt, created_at, lead_id) 
    -> SELECT @cnt := if(@id = l.lead_id,@cnt,0) + 1 as cnt 
    -> , l.created_at 
    -> , @id := l.lead_id as local_resouce_id
    -> FROM leads l join inquiries i on (l.inquiry_id=i.id)        
    -> CROSS JOIN (select @id := 0, @cnt := 0) as InitVarsAlias 
    -> where i.motive='real' ORDER BY lead_id, created_at;
Query OK, 2172774 rows affected (17.25 sec)
Records: 2172774  Duplicates: 0  Warnings: 0

playground>select * from lead_inserts where lead_id in (117,118);
+---------+---------------------+------+
| lead_id | created_at          | cnt  |
+---------+---------------------+------+
|     117 | 2012-06-23 00:13:09 |    1 |
|     117 | 2014-09-14 04:30:37 |    1 |
|     117 | 2015-01-27 22:34:41 |    1 |
|     117 | 2015-03-19 19:33:51 |    1 |
|     118 | 2014-12-24 17:47:15 |    1 |
|     118 | 2015-01-23 21:30:09 |    1 |
|     118 | 2015-04-07 21:33:43 |    1 |
|     118 | 2015-04-10 17:00:04 |    1 |
|     118 | 2015-05-12 21:59:49 |    1 |
+---------+---------------------+------+

编号怎么了?使用原始表时的其他观察结果:

  1. 如果我不处理所有记录并仅指定几个lead_id,则计算结果正确。
  2. 如果我删除INSERT子句并将其作为select运行(使用LIMIT子句只显示50行输出),则计算结果正确。
  3. 那么,这是我遇到的错误,还是我错过了什么?在现实生活中,我不能使用上面的过程作为解决方法 - 我真的必须使用线索和查询,因为这些表中的其他列必须是lead_inserts的一部分。

    谢谢!

1 个答案:

答案 0 :(得分:0)

Cha指出,看起来它是一个MySQL优化的东西,当最终结果只是插入新表时,MySQL没有理由做ORDER BY。为什么它适用于测试表而不是生产测试表,当它们具有相同的行数时,我不知道。但这就是我强迫它对将要插入的内容进行排序的方式:

首先确保我将按以下顺序排列的列的连接索引:

CREATE INDEX idx_leads_lead_id_created ON leads(lead_id, created_at);

然后强制MySQL使用此索引:

insert into lead_inserts (cnt, created_at, lead_id) 
SELECT @cnt := if(@id = l.lead_id,@cnt,0) + 1 as cnt 
, l.created_at 
@id := l.lead_id as local_resouce_id
FROM leads l FORCE INDEX FOR ORDER BY (idx_leads_lead_id_created)
JOIN inquiries i on (l.inquiry_id=i.id)        
CROSS JOIN (select @id := 0, @cnt := 0) as InitVarsAlias 
WHERE i.motive='real' 
ORDER BY lead_id, created_at;