MySQL右联接数据未正确计数

时间:2019-07-18 17:18:20

标签: mysql count right-join

我有此数据:

+-------------+--------------------+-----------------------+
| employee_id | assignment_started | assignment_terminated |
+-------------+--------------------+-----------------------+
|           1 | 2018-07-01         | (NULL)                |
|           2 | 2018-09-01         | (NULL)                |
|           3 | 2018-10-13         | (NULL)                |
|           4 | 2018-10-13         | (NULL)                |
|           5 | 2018-10-15         | 2019-07-17            |
|           6 | 2018-11-01         | (NULL)                |
|           7 | 2019-01-14         | (NULL)                |
|           8 | 2019-01-24         | (NULL)                |
|           9 | 2019-07-01         | 2019-07-30            |
+-------------+--------------------+-----------------------+

我想按月分配被分配的员工。为了确定员工是否正在分配工作,我需要检查我要寻找的日期是否在assigment_started和分配终止之间。但是,如果assignmen_termiated为空,则在NOW()上进行设置。

此外,我还有一个日期范围需要检查。因此,如果我的日期范围为2018年1月1日至2019年7月30日,则需要按月对员工进行计数,如果某些月份没有任何员工在分配工作,则我应将值设为0。

要创建DATE RANGE MONTHS,请使用以下代码:

select DISTINCT CONCAT(YEAR(gen_date),' ',MONTHNAME(gen_date)) AS month_name FROM 
(select adddate('1970-01-01',t4*10000 + t3*1000 + t2*100 + t1*10 + t0) gen_date FROM 
(select 0 t0 union select 1 union select 2 union select 3 union select 4 union select 5 union select 6 union select 7 union select 8 union select 9) t0, 
(select 0 t1 union select 1 union select 2 union select 3 union select 4 union select 5 union select 6 union select 7 union select 8 union select 9) t1, 
(select 0 t2 union select 1 union select 2 union select 3 union select 4 union select 5 union select 6 union select 7 union select 8 union select 9) t2, 
(select 0 t3 union select 1 union select 2 union select 3 union select 4 union select 5 union select 6 union select 7 union select 8 union select 9) t3, 
(select 0 t4 union select 1 union select 2 union select 3 union select 4 union select 5 union select 6 union select 7 union select 8 union select 9) t4) v 
WHERE gen_date between '2018-01-01 00:00:00' and '2019-08-31 23:59:59'

我从中得到的是:

+-------------+
| month_name  | 
+-------------+
|2018 January |
|2018 February| 
|2018 March   | 
|2018 April   | 
|         ... | 
|         ... |
|         ... |
|2019 August  | 
+-------------+

从上面的数据中,您将看到直到2018年7月,我的员工人数为0,对于2018年7月,我的员工人数为1,例如,在2018年9月,我的员工数应该为5,因为有5名员工在工作那个月。

为简化问题,我使用此代码来实现所需的功能,但由于某种原因,计数结果不正确...我试图弄清楚这一点,但不知道为什么我得到了可以找到的结果在下面的小提琴中。

SELECT calendar.month_name, COUNT(employee_id) AS emp_count
FROM job_order_employees
RIGHT JOIN (select DISTINCT CONCAT(YEAR(gen_date),' ',MONTHNAME(gen_date)) AS month_name FROM 
(select adddate('1970-01-01',t4*10000 + t3*1000 + t2*100 + t1*10 + t0) gen_date FROM 
(select 0 t0 union select 1 union select 2 union select 3 union select 4 union select 5 union select 6 union select 7 union select 8 union select 9) t0, 
(select 0 t1 union select 1 union select 2 union select 3 union select 4 union select 5 union select 6 union select 7 union select 8 union select 9) t1, 
(select 0 t2 union select 1 union select 2 union select 3 union select 4 union select 5 union select 6 union select 7 union select 8 union select 9) t2, 
(select 0 t3 union select 1 union select 2 union select 3 union select 4 union select 5 union select 6 union select 7 union select 8 union select 9) t3, 
(select 0 t4 union select 1 union select 2 union select 3 union select 4 union select 5 union select 6 union select 7 union select 8 union select 9) t4) v 
WHERE gen_date between '2018-01-01 00:00:00' and '2019-08-31 23:59:59') as calendar
ON STR_TO_DATE(CONCAT(calendar.month_name,'01'),'%Y %M %d') BETWEEN job_order_employees.assignment_started AND IFNULL(job_order_employees.assignment_terminated,NOW())
GROUP BY calendar.month_name
ORDER BY STR_TO_DATE(calendar.month_name,'%Y %M') 

以下是一些示例数据:

-- Dumping structure for table d-works-test.job_order_employees
CREATE TABLE IF NOT EXISTS `job_order_employees` (
  `id` int(10) unsigned NOT NULL AUTO_INCREMENT,
  `employee_id` int(10) unsigned NOT NULL,
  `assignment_started` date NOT NULL,
  `assignment_terminated` date DEFAULT NULL,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=1 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci;

-- Dumping data for table d-works-test.job_order_employees: ~14 rows (approximately)
/*!40000 ALTER TABLE `job_order_employees` DISABLE KEYS */;
INSERT INTO `job_order_employees` 
(`id`
, `employee_id`
,`assignment_started`
, `assignment_terminated`) VALUES
(1, 1,'2019-05-29', NULL),
(2, 2,'2018-09-19', NULL),
(3, 3,'2018-07-01', NULL),
(4, 4, '2018-10-13', NULL),
(5, 5, '2018-10-13', NULL),
(6, 6, '2019-02-01', NULL),
(7, 7, '2019-01-14', NULL),
(8, 8, '2018-11-01', NULL),
(9, 8, '2019-01-01', NULL),
(10, 9, '2019-02-01', NULL),
(11, 9, '2019-01-24', NULL),
(12, 9, '2018-12-31', NULL),
(13, 10, '2018-10-13', '2019-07-17'),
(14, 10, '2019-07-01', '2019-07-17');

和DB Fiddle相同:https://www.db-fiddle.com/f/8dUFx1DWiyypbkx9s2cYyG/1

提前感谢您的帮助!

2 个答案:

答案 0 :(得分:0)

您可以通过推迟格式化月份字符串直到最后一步来简化您的逻辑。您需要做很多工作来转换仅与最终格式相关的内容。

这也将有所帮助,因为这样您就可以定义每个月的包含开始日期和包含结束日期,如下所示:

SELECT adddate('1970-01-01',t4 * 10000 + t3 * 1000 + t2 * 100 + t1 * 10 + t0)gen_date FROM(东西)v

然后,像这样使用它:

SELECT [format rangestart here], COUNT(employee_id) AS emp_count
FROM (
   SELECT DISTINCT gen_date AS rangestart, gen_date + INTERVAL 1 MONTH AS rangeend 
   FROM v
   WHERE gen_date BETWEEN '2018-01-01 00:00:00' AND'2019-08-31 23:59:59'
) as calendar
LEFT JOIN job_order_employees AS joe
   ON IFNULL(joe.assignment_terminated,NOW()) >= calendar.rangestart
   AND joe.assignment_started <= calendar.rangeend
GROUP BY calendar.rangestart
ORDER BY calendar.rangestart 
;

联接逻辑(重叠检查条件)看起来有些奇怪,直到您意识到它的来源。是“没有重叠的部分”的简化。

NOT (ended < range_start || started > range_end) 简化为 ended >= range_start && started <= range_end


编辑:上面错误地认为子查询每月产生一次;以下应该起作用

日历查询 (这将覆盖约83年,您可以添加另一个乘以1000的t#表来获得833年的价值)

SELECT '1970-01-01' + INTERVAL t0 + t1 * 10 + t2 * 100 MONTH AS start_date
    , '1970-01-01' + INTERVAL 1 + t0 + t1 * 10 + t2 * 100 MONTH AS end_date  
FROM (SELECT 0 t0 UNION SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION SELECT 4 UNION SELECT 5 UNION SELECT 6 UNION SELECT 7 UNION SELECT 8 UNION SELECT 9) t0
    , (SELECT 0 t1 UNION SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION SELECT 4 UNION SELECT 5 UNION SELECT 6 UNION SELECT 7 UNION SELECT 8 UNION SELECT 9) t1
    , (SELECT 0 t2 UNION SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION SELECT 4 UNION SELECT 5 UNION SELECT 6 UNION SELECT 7 UNION SELECT 8 UNION SELECT 9) t2

最终查询

SELECT [format calendar.start_date here]
   , COUNT(employee_id) AS emp_count
FROM ( 
   *calendar query above goes here* 
) as calendar
LEFT JOIN job_order_employees AS joe
   ON IFNULL(joe.assignment_terminated,NOW()) >= calendar.start_date
   AND joe.assignment_started < calendar.end_date
WHERE calendar.start_date BETWEEN '2018-01-01 00:00:00' AND '2019-08-31 23:59:59'
GROUP BY calendar.start_date
ORDER BY calendar.start_date
;

注意:我还更改了重叠比较中的运算符;由于生成的end_date是非包容性的,因此应该 NOT (ended < range_start || started >= range_end) 简化为ended >= range_start && started < range_end

答案 1 :(得分:0)

我建议使用合并输入当前日期。然后,我将创建需要计数的月份列表,并将其加入按员工和月份分组的工作分配列表中。