我有一个当前有效的查询,但是效率低下。我基本上是试图按小时对人工和销售数据进行分组。
我希望能够通过每天查询来做到这一点。
我正在使用PostgreSQL。
我有一堆具有employee_id
,job_id
和location_id
的时间记录,但是如果员工已经上班但还没有下班,我必须检查clock_out_time字段并将其设置为now()
,以正确进行小时计算。
计划时间:0.509毫秒
执行时间:0.498毫秒
我正在处理30-50条记录,因此不会扩展。
该怎么做才能改善这一点?
SELECT
date_trunc('hour', tp.clock_in_time) AS hour,
SUM(
(
EXTRACT (DAY FROM (CASE WHEN EXTRACT(YEAR FROM tp.clock_out_time) = -1 THEN now() ELSE tp.clock_out_time END - tp.clock_in_time))*24*60*60+
EXTRACT (HOUR FROM (CASE WHEN EXTRACT(YEAR FROM tp.clock_out_time) = -1 THEN now() ELSE tp.clock_out_time END - tp.clock_in_time))*60*60+
EXTRACT (MINUTE FROM (CASE WHEN EXTRACT(YEAR FROM tp.clock_out_time) = -1 THEN now() ELSE tp.clock_out_time END - tp.clock_in_time))*60+
EXTRACT (SECOND FROM (CASE WHEN EXTRACT(YEAR FROM tp.clock_out_time) = -1 THEN now() ELSE tp.clock_out_time END - tp.clock_in_time))
) / 60 / 60.00 * (job.rate / 100.00)
) AS labor_costs,
(
SELECT
SUM(total) / 100.00
FROM
ticket
WHERE
open=false
AND
DATE_TRUNC('day', opened_at) = date_trunc('day', '2018-12-22T11:15:05-05:00'::date)
AND
DATE_TRUNC('day', closed_at) = date_trunc('day', '2018-12-22T11:15:05-05:00'::date)
GROUP BY date_trunc('hour', opened_at)
ORDER BY date_trunc('hour', opened_at)
) AS hourly_sales
FROM
employee_time_punch as tp
INNER JOIN
employee
ON
employee.id = tp.employee_id
INNER JOIN
employee_job as job
ON
job.id = tp.job_id
WHERE
DATE_TRUNC('day', tp.clock_in_time) = DATE_TRUNC('day', '2006-01-02T11:15:05-05:00'::date)
AND
DATE_TRUNC('day', CASE WHEN EXTRACT(YEAR FROM tp.clock_out_time) = -1 THEN now() ELSE tp.clock_out_time END) = DATE_TRUNC('day', '2006-01-02T11:15:05-05:00'::date)
GROUP BY 1
ORDER BY 1;
答案 0 :(得分:1)
DATE_TRUNC('day',tp.clock_in_time)= DATE_TRUNC('day','2006-01-02T11:15:05-05:00':: date)
此单一过滤条件会损害您的查询。它遭受“平等中的左侧表达”综合症,这使任何索引的使用都无法实现。此时,PostgreSQL可能正在对表执行全表扫描。
如果重新定义条件,则可以使查询更快:
WHERE tp.clock_in_time BETWEEN ...begin_of_day... AND ...end_of_day...
您可以根据需要在CTE中预计算这些值。
而且-当然-您需要在该列上有一个索引,如下所示:
create index ix1 on employee_time_punch (clock_in_time);
通过此更改,PostgreSQL将改为执行“索引范围扫描”,这要快得多。
答案 1 :(得分:0)
@TheImpaler回答,必须改进您比较日期的方式,并且可以使用CTE来预先计算分析窗口。
以下是查询的其他简化形式,应有助于使其更快,更易读:
WHERE
子句表示为CROSS JOIN
; COALESCE
函数可用于将clock_in_time
默认设置为NOW
hourly_sales
,请使用JOIN
而不是子查询EXTRACT(EPOCH FROM...)
来计算员工轮换的持续时间,而不是重复EXTRACT(HOUR/MINUTE/SECOND...)
labor_costs
函数外部移动SUM
的固定算术运算查询:
WITH dates AS (
SELECT
DATE_TRUNC('day', '2006-01-02T11:15:05-05:00'::date) AS wstart,
DATE_TRUNC('day', '2006-01-02T11:15:05-05:00'::date) + interval '1' day AS wend
)
SELECT
date_trunc('hour', tp.clock_in_time) AS hour,
SUM(
EXTRACT(EPOCH FROM COALESCE(tp.clock_out_time, NOW()) - tp.clock_in_time)
* job.rate
) / 60 / 60 / 100.00 AS labor_costs,
SUM(ticket.total)/100.00 AS hourly_sales
FROM
dates
INNER JOIN employee_time_punch AS tp
ON tp.clock_in_time BETWEEN dates.wstart AND dates.wend
AND COALESCE(tp.clock_out_time, NOW()) BETWEEN dates.wstart AND dates.wend
INNER JOIN employee
ON employee.id = tp.employee_id
INNER JOIN employee_job AS job
ON job.id = tp.job_id
INNER JOIN ticket
ON ticket.open = false
AND ticket.opened_at BETWEEN dates.wstart AND dates.wend
AND ticket.closed_at BETWEEN dates.wstart AND dates.wend
GROUP BY 1;
要进行更多优化,您可以在涉及的所有日期列上创建索引(每个表一个复合索引的效果可能很好):
employee_time_punch
中:clock_in_time
和clock_out_time
ticket
中:opened_at
和closed_at