我有一张表格,每当位置的得分发生变化时,该表格就会记录一行。
得分历史:
这样做是出于效率的考虑,并且能够简单地检索给定位置的更改列表并很好地实现了这一目的。
我正在尝试以非常冗余的格式输出数据,以帮助将其加载到严格的外部系统中。外部系统希望每个位置*每个日期都有一行。目标是代表每个日期每个位置的最后得分值。因此,如果分数在给定日期中更改了3次,则只有最接近午夜的分数才被视为该位置当天的分数。我想这类似于创建关闭业务库存级别事实表的挑战。
我有一个方便的星型模式日期维度表,该表的每个日期都有一行,完全覆盖了这个示例期间以及未来的日子。
那张桌子看起来像
dw_dim_date:
所以,如果我在score_history表中只有3条记录...
1, 2019-01-01:10:13:01, 100, 5.0
2, 2019-01-05:20:00:01, 100, 5.8
3, 2019-01-05:23:01:22, 100, 6.2
所需的输出将是:
2019-01-01, 100, 5.0
2019-01-02, 100, 5.0
2019-01-03, 100, 5.0
2019-01-04, 100, 5.0
2019-01-05, 100, 6.2
3要求:
我一直在通过子查询和窗口功能来追踪自己的尾巴。
因为我不愿发表任何我尝试过的内容,我会分享这个火车残骸,它会产生输出,但毫无意义...
SELECT dw_dim_date.date,
(SELECT score
FROM score_history
WHERE score_history.happened_at::DATE < dw_dim_date.date
OR score_history.happened_at::DATE = dw_dim_date.date
ORDER BY score_history.id desc limit 1) as last_score
FROM dw_dim_date
WHERE dw_dim_date.date > '2019-06-01'
感谢您提供指导或其他问题的阅读指南。
答案 0 :(得分:5)
您可以通过使用相关子查询和LATERAL
来实现:
SELECT sub.date, sub.location_id, score
FROM (SELECT * FROM dw_dim_date
CROSS JOIN (SELECT DISTINCT location_id FROM score_history) s
WHERE date >= '2019-01-01'::date) sub
,LATERAL(SELECT score FROM score_history sc
WHERE sc.happened_at::date <= sub.date
AND sc.location_id = sub.location_id
ORDER BY happened_at DESC LIMIT 1) l
,LATERAL(SELECT MIN(happened_at::date) m1, MAX(happened_at::date) m2
FROM score_history sc
WHERE sc.location_id = sub.location_id) lm
WHERE sub.date BETWEEN lm.m1 AND lm.m2
ORDER BY location_id, date;
工作原理:
1)s
(这是每个location_id的所有日期的交叉联接)
2)l
(选择每个位置的分数)
3)lm
(选择每个位置的最小/最大日期进行过滤)
4)WHERE
在可用范围内过滤日期,如有必要,可以放宽
答案 1 :(得分:2)
我认为您可以尝试类似的方法。我更改的主要内容是将内容包装在DATE()中,并使用另一个SO答案作为日期查找器:
"Download": {
"$id": {
".write": "newData.child('1').child('password').val() === 4321"
}
}
此方法从此处使用SQL方法查找与请求的数据最接近的过去数据:PostgreSQL return exact or closest date to queried date
答案 2 :(得分:0)
WITH
max_per_day_location AS (
SELECT
SH.happened_at::DATE as day,
SH.location_id,
max(SH.happened_at) as happened_at
FROM
score_history SH
GROUP BY
SH.happened_at::DATE,
SH.location_id
),
date_location AS (
SELECT DISTINCT
DD."date",
SH.location_id
FROM
dw_dim_date DD,
max_per_day_location SH
),
value_partition AS (
SELECT
DD."date",
DD.location_id,
SH.score,
SH.happened_at,
MPD.happened_at as hap2,
sum(case when score is null then 0 else 1 end) OVER
(PARTITION BY DD.location_id ORDER BY "date", SH.happened_at desc) AS value_partition
FROM
date_location DD
LEFT JOIN score_history SH
ON DD."date" = SH.happened_at::DATE
AND DD.location_id = SH.location_id
LEFT join max_per_day_location MPD
ON SH.happened_at = MPD.happened_at
WHERE NOT (MPD.happened_at IS NULL
AND
SH.happened_at IS NOT NULL)
ORDER BY
DD."date"
),
final AS (
SELECT
"date",
location_id,
first_value(score) over w
FROM
value_partition
WINDOW w AS (PARTITION BY location_id, value_partition
ORDER BY happened_at rows between unbounded preceding and unbounded following)
order by "date"
)
SELECT DISTINCT * FROM final ORDER BY location_id, date
;
我确定执行此操作的详细方法较少。
我有一个带有一些测试数据的SQLFiddle: http://sqlfiddle.com/#!17/9d122/1
使这项工作最主要的是制作“值分区”以访问先前的非null值。更多内容:
date_location
子查询每天每个location_id仅产生一行,因为这是输出中所需的基本“行级别”。
max_per_day_location
子查询用于过滤具有多个得分且仅保留该天最后一个的位置/日期组合的早期条目。
答案 3 :(得分:0)
最简单的解决方案可能是:
select dw_dim_date.date, location_id, score
from dw_dim_date, score_history S1
where happened_at::date <= dw_dim_date.date and
not exists (select *
from score_history S2
where S2.happened_at::date <= dw_dim_date.date and
S1.happened_at< S2.happened_at and
S1.location_id = S2.location_id)
这将计算日期和分数历史之间的笛卡尔积,然后针对每个日期和位置,获取不存在更高分数(在日期期限内)的分数。我建议从此开始,因为它可能最容易维护,并且如果效率不够高(使用适当的索引),则只能采用更复杂的解决方案。
SQL小提琴在https://dbfiddle.uk/?rdbms=postgres_9.4&fiddle=3c2e4ae49cbc43f7840b942d223be119