我正在使用MySQL数据库,尝试获取已在时间轴内完成某项任务的部门中用户数量的统计信息。
我的问题是:某些用户多次执行任务。我能够构建一个查询,它返回已完成任务的数量和每个组的总用户数,但我只需要为每个用户计算一个“任务”。出于这个原因,当只有一个人完成了足以填满整个部门的要求时,我得到的结果就是“150%的[部门]完成了任务”。
以下是现有查询:
SELECT total.department, total_count, IFNULL(done, 0) as done_count, ROUND((IFNULL(done, 0) / total_count)*100, 2) as percent
FROM (SELECT department, COUNT(*) total_count FROM agents GROUP BY department) total
LEFT JOIN (SELECT a.department as department, COUNT(*) as done FROM agents a, tasks p WHERE p.task_responses_id IS NOT NULL AND (p.agent1_id = a.id OR p.agent2_id = a.id)
GROUP BY a.department) done ON done.department = total.department;
返回一个这样的表(部门名称已清理):
+------------------+-------------+------------+---------+
| department | total_count | done_count | percent |
+------------------+-------------+------------+---------+
| a | 2 | 0 | 0.00 |
| b | 10 | 1 | 10.00 |
| c | 2 | 0 | 0.00 |
| d | 1 | 0 | 0.00 |
| e | 2 | 2 | 100.00 |
| f | 1 | 0 | 0.00 |
| g | 3 | 6 | 200.00 |
| h | 4 | 0 | 0.00 |
| i | 4 | 1 | 25.00 |
+------------------+-------------+------------+---------+
如你所见,部门“g”已经完成了&count; total_count由于该部门的一个人多次这样做。我需要使用看起来像这样的任务表:
+-----+----------------+-----------+-----------+-----------------------+---------------------+------+
| id | reservation_id | agent1_id | agent2_id | task_responses_id | last_contact | dnc |
+-----+----------------+-----------+-----------+-----------------------+---------------------+------+
| 128 | 6457633 | 9 | NULL | 24 | 2015-10-06 00:00:00 | 1 |
| 130 | 6799659 | 10 | NULL | 25 | 2015-10-06 00:00:00 | NULL |
| 145 | 7004981 | 36 | NULL | 28 | 2015-10-08 00:00:00 | NULL |
| 150 | 7091836 | 36 | NULL | 29 | 2015-10-08 00:00:00 | NULL |
| 152 | 7128330 | 36 | NULL | 30 | 2015-10-08 00:00:00 | NULL |
| 155 | 7165876 | 16 | NULL | 35 | 2015-10-08 00:00:00 | NULL |
| 166 | 7308234 | 36 | NULL | 31 | 2015-10-08 00:00:00 | NULL |
| 171 | 7333373 | 36 | NULL | 33 | 2015-10-08 00:00:00 | NULL |
| 173 | 7408857 | 37 | NULL | 34 | 2015-10-08 00:00:00 | NULL |
+-----+----------------+-----------+-----------+-----------------------+---------------------+------+
如果我们已经为给定的代理ID检索了一行,我想不要抓住任何其他人的ID。
非常感谢你的帮助!我很乐意澄清您可能遇到的任何问题。
答案 0 :(得分:1)
我认为这可以通过将第3行中的“count(*)”替换为“count(distinct a.id)”来实现
这样,如果同一个代理ID不止一次存在,它将只计算一次。
所以查询看起来像这样:
SELECT total.department, total_count, IFNULL(done, 0) as done_count, ROUND((IFNULL(done, 0) / total_count)*100, 2) as percent
FROM (SELECT department, COUNT(*) total_count FROM agents GROUP BY department) total
LEFT JOIN (SELECT a.department as department, COUNT(distinct a.id) as done FROM agents a, tasks p WHERE p.task_responses_id IS NOT NULL AND (p.agent1_id = a.id OR p.agent2_id = a.id)
GROUP BY a.department) done ON done.department = total.department;
答案 1 :(得分:0)
要计算同一查询中每个部门的代理数量与完成任务的数量,您可以在选择列表中使用子查询,但这也不会执行。相反,我推荐以下内容,它更复杂但性能最佳:
SELECT d.department, count(*) as dept_count, sum(d.done) as done_count
FROM (SELECT *,
(CASE WHEN EXISTS(
SELECT * FROM tasks
WHERE (agents.id = tasks.agent1_id OR agents.id = tasks.agent2_id)
AND tasks.task_responses_id IS NOT NULL
) THEN 1 ELSE 0 END
) as done
FROM agents
) as d
GROUP BY department;
此版本使用代理表上的内部查询来添加"完成"如果该代理符合条件,则值为1的列,否则为0。外部查询计算所有行,但也总结1到1的数量以获取done_count。
答案 2 :(得分:-2)
您应该使用EXISTS子查询(这也称为半连接)。您想要计算某些条件所适用的用户数。我没有完整的架构,但看起来应该做你想做的事情:
SELECT department, count(*) AS done_count
FROM agents
WHERE EXISTS(
SELECT * FROM tasks
WHERE (agents.id = tasks.agent1_id OR agents.id = tasks.agent2_id)
AND tasks.task_responses_id IS NOT NULL
)
GROUP BY department;
此查询几乎完全符合您在第一段中所要求的内容。通过避免使用LEFT JOIN和DISTINCT运算符,您可以为DBMS提供构建合理查询的机会,该查询的运行时间不会超过其所需的时间。