我需要解决的问题:
为了计算每天(公共)假期或生病天数,请使用前三个月的平均工作时间(起始值为每天8小时)。 / p>
棘手的部分是需要考虑上个月的计算值,这意味着如果上个月有一个公共假日,该假日被分配了8.5小时的计算值,则这些计算的小时数将影响平均值上个月的每天工作时间,然后将其分配给当前月份的假期。
到目前为止,我只想出了以下内容,但尚未考虑逐行计算:
WITH
const (h_target, h_extra) AS (VALUES (8.0, 20)),
monthly_sums (c_month, d_work, d_off, h_work) AS (VALUES
('2018-12', 16, 5, 150.25),
('2019-01', 20, 3, 171.25),
('2019-02', 15, 5, 120.5)
),
calc AS (
SELECT
ms.*,
(ms.d_work + ms.d_off) AS d_total,
(ms.h_work + ms.d_off * const.h_target) AS h_total,
(avg((ms.h_work + ms.d_off * const.h_target) / (ms.d_work + ms.d_off))
OVER (ORDER BY ms.c_month ROWS BETWEEN 2 PRECEDING AND CURRENT ROW))::numeric(10,2)
AS h_off
FROM monthly_sums AS ms
CROSS JOIN const
)
SELECT
calc.c_month,
calc.d_work,
calc.d_off,
calc.d_total,
calc.h_work,
calc.h_off,
(d_off * lag(h_off, 1, const.h_target) OVER (ORDER BY c_month)) AS h_off_sum,
(h_work + d_off * lag(h_off, 1, const.h_target) OVER (ORDER BY c_month)) AS h_sum
FROM calc CROSS JOIN const;
...给出以下结果:
c_month | d_work | d_off | d_total | h_work | h_off | h_off_sum | h_sum
---------+--------+-------+---------+--------+-------+-----------+--------
2018-12 | 16 | 5 | 21 | 150.25 | 9.06 | 40.0 | 190.25
2019-01 | 20 | 3 | 23 | 171.25 | 8.77 | 27.18 | 198.43
2019-02 | 15 | 5 | 20 | 120.5 | 8.52 | 43.85 | 164.35
(3 rows)
对于依赖先前行值(lag
)的列,这可以正确计算出第一行和第二行,但是每天平均小时数的计算显然是错误的,因为我不知道该如何填充当前行值(h_sum
)返回到新h_off
的计算中。
所需结果如下:
c_month | d_work | d_off | d_total | h_work | h_off | h_off_sum | h_sum
---------+--------+-------+---------+--------+-------+-----------+--------
2018-12 | 16 | 5 | 21 | 150.25 | 9.06 | 40.0 | 190.25
2019-01 | 20 | 3 | 23 | 171.25 | 8.84 | 27.18 | 198.43
2019-02 | 15 | 5 | 20 | 120.5 | 8.64 | 44.2 | 164.7
(3 rows)
...表示h_off
用于下个月的h_off_sum
,结果h_sum
和h_sum
的可用月份(最多三个)依次变成当前月份的h_off
(主要是三个月内的avg(h_sum / d_total)
)的计算。
因此,实际计算为:
c_month | calculation | h_off
---------+----------------------------------------------------+-------
| | 8.00 << initial
.---------------------- uses ---------------------^
2018-12 | ((190.25 / 21)) / 1 | 9.06
.------------ uses ---------------^
2019-01 | ((190.25 / 21) + (198.43 / 23)) / 2 | 8.84
.--- uses --------^
2019-02 | ((190.25 / 21) + (198.43 / 23) + (164.7 / 20)) / 3 | 8.64
P.S .:我正在使用PostgreSQL 11,因此如果有任何不同,我可以使用最新的功能。
答案 0 :(得分:0)
我根本无法使用window functions来解决列间+行间计算的问题,而且还不能不退回到recursive CTE的特殊用法以及在历史第三个月的天(Number of users involved
[Arguments] @{users}
:FOR ${user} IN @{users}
\ Log ${user}
和小时(d_total_1
)中引入特殊用途的列(因为您不能多次加入递归临时表)。
此外,我在输入数据中添加了第四行,并使用了一个附加索引列,在连接时可以引用该索引列,通常由一个子查询组成,如下所示:
h_sum_1
所以,这是我的看法:
SELECT ROW_NUMBER() OVER (ORDER BY c_month) AS row_num, * FROM monthly_sums
...给出:
WITH RECURSIVE calc AS (
SELECT
monthly_sums.row_num,
monthly_sums.c_month,
monthly_sums.d_work,
monthly_sums.d_off,
monthly_sums.h_work,
(monthly_sums.d_off * 8)::numeric(10,2) AS h_off_sum,
monthly_sums.d_work + monthly_sums.d_off AS d_total,
0.0 AS d_total_1,
(monthly_sums.h_work + monthly_sums.d_off * 8)::numeric(10,2) AS h_sum,
0.0 AS h_sum_1,
(
(monthly_sums.h_work + monthly_sums.d_off * 8)
/
(monthly_sums.d_work + monthly_sums.d_off)
)::numeric(10,2) AS h_off
FROM
(
SELECT * FROM (VALUES
(1, '2018-12', 16, 5, 150.25),
(2, '2019-01', 20, 3, 171.25),
(3, '2019-02', 15, 5, 120.5),
(4, '2019-03', 19, 2, 131.75)
) AS tmp (row_num, c_month, d_work, d_off, h_work)
) AS monthly_sums
WHERE
monthly_sums.row_num = 1
UNION ALL
SELECT
monthly_sums.row_num,
monthly_sums.c_month,
monthly_sums.d_work,
monthly_sums.d_off,
monthly_sums.h_work,
lat_off.h_off_sum::numeric(10,2),
lat_days.d_total,
calc.d_total AS d_total_1,
lat_sum.h_sum::numeric(10,2),
calc.h_sum AS h_sum_1,
lat_calc.h_off::numeric(10,2)
FROM
(
SELECT * FROM (VALUES
(1, '2018-12', 16, 5, 150.25),
(2, '2019-01', 20, 3, 171.25),
(3, '2019-02', 15, 5, 120.5),
(4, '2019-03', 19, 2, 131.75)
) AS tmp (row_num, c_month, d_work, d_off, h_work)
) AS monthly_sums
INNER JOIN calc ON (calc.row_num = monthly_sums.row_num - 1),
LATERAL (SELECT monthly_sums.d_work + monthly_sums.d_off AS d_total) AS lat_days,
LATERAL (SELECT monthly_sums.d_off * calc.h_off AS h_off_sum) AS lat_off,
LATERAL (SELECT monthly_sums.h_work + lat_off.h_off_sum AS h_sum) AS lat_sum,
LATERAL (SELECT
(calc.h_sum_1 + calc.h_sum + lat_sum.h_sum)
/
(calc.d_total_1 + calc.d_total + lat_days.d_total)
AS h_off
) AS lat_calc
WHERE
monthly_sums.row_num > 1
)
SELECT c_month, d_work, d_off, d_total, h_work, h_off, h_off_sum, h_sum FROM calc
;
(PostgreSQL的默认类型转换行为是round numeric values,因此结果与最初预期的略有不同,但实际上是正确的)
请注意,PostgreSQL通常对数据类型非常挑剔,并且每当存在可能导致精度下降的差异(例如 c_month | d_work | d_off | d_total | h_work | h_off | h_off_sum | h_sum
---------+--------+-------+---------+--------+-------+-----------+--------
2018-12 | 16 | 5 | 21 | 150.25 | 9.06 | 40.00 | 190.25
2019-01 | 20 | 3 | 23 | 171.25 | 8.83 | 27.18 | 198.43
2019-02 | 15 | 5 | 20 | 120.5 | 8.65 | 44.15 | 164.65
2019-03 | 19 | 2 | 21 | 131.75 | 8.00 | 17.30 | 149.05
(4 rows)
与numeric
)时,拒绝处理此类查询,这就是为什么我在两个地方都为列使用显式类型的原因。
难题的最后一部分通过使用LATERAL子查询解决了,这使我可以使一个计算参考上一个的结果,甚至可以独立于计算层次结构而在最终输出中的列之间移动
如果任何人都可以提出一个更简单的变体,我很乐意学习。