带计数和日期的Double GroupBy返回错误的日期

时间:2019-05-25 06:46:38

标签: mysql sql

我有三个表来跟踪电子邮件及其分配的类别:Email保留邮件的内容,Category列出类别,并且Classification链接一个Email条目ID带有Category条目ID。 SQLFiddle上提供了带有示例数据和查询的架构:http://sqlfiddle.com/#!9/a410a6/26/0

CREATE TABLE `Category` (
  `id` int(6) unsigned NOT NULL,
  `name` varchar(20) NOT NULL,
  PRIMARY KEY (`id`)
) DEFAULT CHARSET=utf8;

CREATE TABLE `Mail` (
  `id` int(6) unsigned NOT NULL,
  `content` varchar(500) NOT NULL,
  `date` datetime NOT NULL,  
  PRIMARY KEY (`id`)
) DEFAULT CHARSET=utf8;


CREATE TABLE `Classification` (
  `id` int(6) unsigned NOT NULL,
  `mail_id` int(6) unsigned NOT NULL,
  `category_id` int(6) unsigned NOT NULL,
  FOREIGN KEY (mail_id) REFERENCES Mail(id),
  FOREIGN KEY (category_id) REFERENCES Category(id),
  PRIMARY KEY (`id`)
) DEFAULT CHARSET=utf8;

INSERT INTO `Category` (`id`, `name`) VALUES
  ('1', 'Important'),
  ('2', 'Urgent'),
  ('3', 'Normal');

INSERT INTO `Mail` (`id`, `content`, `date`) VALUES
  ('1', 'Important Email', '2019-01-04T13:53:52'),
  ('2', 'Urgent Email', '2019-01-19T13:53:52'),
  ('3', 'Very Urgent Email', '2019-01-24T13:53:52'),
  ('4', 'Quite Urgent Email', '2019-01-24T13:53:52'),
  ('5', 'Normal Email', '2019-01-21T13:53:52'),
  ('6', 'Regular Email', '2019-01-14T13:53:52'),
  ('7', 'Regular Email', '2019-01-23T13:53:52'),
  ('8', 'Regular Email', '2019-01-23T13:53:52'),
  ('9', 'Regular Email', '2019-01-20T13:53:52'),
  ('10', 'Very Urgent Email', '2019-01-25T13:53:52'),
  ('11', 'Very Urgent Email', '2019-01-25T13:53:52');


INSERT INTO `Classification` (`id`, `mail_id`, `category_id`) VALUES
  ('1', '1', '1'),
  ('2', '2', '2'),
  ('3', '3', '2'),
  ('4', '4', '2'),
  ('5', '5', '3'),
  ('6', '6', '3'),
  ('7', '6', '3'),
  ('8', '6', '3'),
  ('9', '6', '3'),
  ('10', '6', '2'),
  ('11', '6', '2');

我想返回记录的每个日期每个类别收到的邮件数量,即我的预期结果是

+----------------------+-----------+----------+
|         date         |   name    | count(*) |
+----------------------+-----------+----------+
| 2019-01-04T13:53:52Z | Important |        1 |
| 2019-01-14T13:53:52Z | Normal    |        1 |
| 2019-01-19T13:53:52Z | Urgent    |        1 |
| 2019-01-20T13:53:52Z | Normal    |        1 |
| 2019-01-21T13:53:52Z | Normal    |        1 |
| 2019-01-23T13:53:52Z | Normal    |        2 |
| 2019-01-24T13:53:52Z | Urgent    |        1 |
| 2019-01-25T13:53:52Z | Urgent    |        2 |
+----------------------+-----------+----------+

为此,我使用double groupby运行以下查询,并在Classification表上进行过滤:

SELECT Mail.date, Category.name, count(*) FROM Mail, Classification, Category WHERE Category.id = Classification.category_id AND Classification.mail_id = Mail.id GROUP BY Mail.date, Category.name 

哪个给我以下结果:

+----------------------+-----------+----------+
|         date         |   name    | count(*) |
+----------------------+-----------+----------+
| 2019-01-04T13:53:52Z | Important |        1 |
| 2019-01-14T13:53:52Z | Normal    |        4 |
| 2019-01-14T13:53:52Z | Urgent    |        2 |
| 2019-01-19T13:53:52Z | Urgent    |        1 |
| 2019-01-21T13:53:52Z | Normal    |        1 |
| 2019-01-24T13:53:52Z | Urgent    |        2 |
+----------------------+-----------+----------+

这是完全错误的。

我尝试用WHERE语句代替JOIN

SELECT Mail.date, Category.name, count(*) FROM (Mail, Category) RIGHT JOIN Classification ON Category.id = Classification.category_id AND Classification.mail_id = Mail.id GROUP BY Mail.date, Category.name `

但是我得到与上面完全相同的结果。

为什么这些查询返回这些错误结果,我应该怎么做才能解决它们?

1 个答案:

答案 0 :(得分:1)

首先,您的查询应如下所示:

SELECT m.date, c.name, count(*)
FROM Mail m JOIN
     Classification cl
     ON cl.mail_id = m.id JOIN
     Category c
     ON c.id = cl.category_id 
GROUP BY m.date, c.name ;

现在我们已经解决了这个问题,您的问题是电子邮件具有多个类别。因此,它们被乘以计数。因此,您得到的结果是正确的。

您在分类表中有完全相同的副本,因此一个简单的解决方案是:

SELECT m.date, c.name, count(distinct m.id)
FROM Mail m JOIN
     Classification cl
     ON cl.mail_id = m.id JOIN
     Category c
     ON c.id = cl.category_id 
GROUP BY m.date, c.name ;

也就是说,真正的解决方案是修复您的数据,因此它不会重复。

Here是使用您的数据的SQL Fiddle。您在2019-01-23上的电子邮件中有一个“ 2”。但是,该日期没有分类电子邮件,因此不在结果中。