有没有更好的方法来计算小时工和销售额?

时间:2018-12-22 23:12:04

标签: sql postgresql

我有一个当前有效的查询,但是效率低下。我基本上是试图按小时对人工和销售数据进行分组。

我希望能够通过每天查询来做到这一点。

我正在使用PostgreSQL。

我有一堆具有employee_idjob_idlocation_id的时间记录,但是如果员工已经上班但还没有下班,我必须检查clock_out_time字段并将其设置为now(),以正确进行小时计算。

  

计划时间:0.509毫秒

     

执行时间:0.498毫秒

我正在处理30-50条记录,因此不会扩展。

该怎么做才能改善这一点?

SELECT
  date_trunc('hour', tp.clock_in_time) AS hour,
  SUM(
    (
      EXTRACT (DAY FROM (CASE WHEN EXTRACT(YEAR FROM tp.clock_out_time) = -1 THEN now() ELSE tp.clock_out_time END - tp.clock_in_time))*24*60*60+
      EXTRACT (HOUR FROM (CASE WHEN EXTRACT(YEAR FROM tp.clock_out_time) = -1 THEN now() ELSE tp.clock_out_time END - tp.clock_in_time))*60*60+
      EXTRACT (MINUTE FROM (CASE WHEN EXTRACT(YEAR FROM tp.clock_out_time) = -1 THEN now() ELSE tp.clock_out_time END - tp.clock_in_time))*60+
      EXTRACT (SECOND FROM (CASE WHEN EXTRACT(YEAR FROM tp.clock_out_time) = -1 THEN now() ELSE tp.clock_out_time END - tp.clock_in_time))
    ) / 60 / 60.00 * (job.rate / 100.00)
  ) AS labor_costs,
  (
  SELECT 
    SUM(total) / 100.00
    FROM 
        ticket
    WHERE 
        open=false 
    AND 
        DATE_TRUNC('day', opened_at) = date_trunc('day', '2018-12-22T11:15:05-05:00'::date) 
    AND
      DATE_TRUNC('day', closed_at) = date_trunc('day', '2018-12-22T11:15:05-05:00'::date) 
    GROUP BY date_trunc('hour', opened_at) 
    ORDER BY date_trunc('hour', opened_at)
    ) AS hourly_sales
FROM 
  employee_time_punch as tp
INNER JOIN
  employee
ON 
  employee.id = tp.employee_id
INNER JOIN
  employee_job as job
ON
  job.id = tp.job_id
WHERE
  DATE_TRUNC('day', tp.clock_in_time) = DATE_TRUNC('day', '2006-01-02T11:15:05-05:00'::date)
AND
    DATE_TRUNC('day', CASE WHEN EXTRACT(YEAR FROM tp.clock_out_time) = -1 THEN now() ELSE tp.clock_out_time END) = DATE_TRUNC('day', '2006-01-02T11:15:05-05:00'::date)
GROUP BY 1
ORDER BY 1;

2 个答案:

答案 0 :(得分:1)

  

DATE_TRUNC('day',tp.clock_in_time)= DATE_TRUNC('day','2006-01-02T11:15:05-05:00':: date)

此单一过滤条件会损害您的查询。它遭受“平等中的左侧表达”综合症,这使任何索引的使用都无法实现。此时,PostgreSQL可能正在对表执行全表扫描。

如果重新定义条件,则可以使查询更快:

WHERE tp.clock_in_time BETWEEN ...begin_of_day... AND ...end_of_day...

您可以根据需要在CTE中预计算这些值。

而且-当然-您需要在该列上有一个索引,如下所示:

create index ix1 on employee_time_punch (clock_in_time);

通过此更改,PostgreSQL将改为执行“索引范围扫描”,这要快得多。

答案 1 :(得分:0)

@TheImpaler回答,必须改进您比较日期的方式,并且可以使用CTE来预先计算分析窗口。

以下是查询的其他简化形式,应有助于使其更快,更易读:

  • 在分析窗口中将WHERE子句表示为CROSS JOINCOALESCE函数可用于将clock_in_time默认设置为NOW
  • 要计算hourly_sales,请使用JOIN而不是子查询
  • 使用单个EXTRACT(EPOCH FROM...)来计算员工轮换的持续时间,而不是重复EXTRACT(HOUR/MINUTE/SECOND...)
  • labor_costs函数外部移动SUM的固定算术运算

查询:

WITH dates AS ( 
    SELECT 
        DATE_TRUNC('day', '2006-01-02T11:15:05-05:00'::date) AS wstart, 
        DATE_TRUNC('day', '2006-01-02T11:15:05-05:00'::date) + interval '1' day AS wend
)
SELECT
  date_trunc('hour', tp.clock_in_time) AS hour,
  SUM(
      EXTRACT(EPOCH FROM COALESCE(tp.clock_out_time, NOW()) - tp.clock_in_time) 
      * job.rate
   ) / 60 / 60 / 100.00 AS labor_costs,
  SUM(ticket.total)/100.00 AS hourly_sales
FROM 
    dates
    INNER JOIN employee_time_punch AS tp
        ON  tp.clock_in_time BETWEEN dates.wstart AND dates.wend
        AND COALESCE(tp.clock_out_time, NOW()) BETWEEN dates.wstart AND dates.wend
    INNER JOIN employee
        ON  employee.id = tp.employee_id
    INNER JOIN employee_job AS job
        ON  job.id = tp.job_id
    INNER JOIN ticket
        ON  ticket.open = false 
        AND ticket.opened_at BETWEEN dates.wstart AND dates.wend
        AND ticket.closed_at BETWEEN dates.wstart AND dates.wend
GROUP BY 1;

要进行更多优化,您可以在涉及的所有日期列上创建索引(每个表一个复合索引的效果可能很好):

  • 在表employee_time_punch中:clock_in_timeclock_out_time
  • 在表ticket中:opened_atclosed_at