我有一组状态更改数据绑定到一堆物品上[Trello卡及其状态更改是特定的]。我希望采用这组转换[Item_id,From_state,To_state,Timestamp],并为每个状态生成一组看起来像[State,Day,Item Count]的数据。
当前,在返回所有转换并汇总它们之后,我正在从Python中以相当占用CPU的方式构建此列表。我一直在寻找在PSQL中执行此操作的更快方法。
答案 0 :(得分:1)
使用[Item_id,From_state,To_state,Timestamp],您需要进行大量工作来计算快照,但是如果您拥有这样的数据,它将非常简单:[Item_id,state,start_timestamp,end_timestamp]
幸运的是,有可能从一种格式过渡到另一种格式:
对于这种问题,我发现最简单的方法是:
请记住,遵循以下模式的某些东西应该起作用。
--CTE for step 1
with days as (SELECT day::date as d
FROM generate_series(timestamp '2004-03-07'
, timestamp '2004-08-16'
, interval '1 day') day)
--CTE FOR step 2
, state_list as (select from_state as s from transition_table t group by from_state)
--CTE for step 3
, time_in_state as( select t.item_id, t.to_state as item_state, t."Timestamp" as start_timestamp
, (select min(t2."Timestamp") from transition_table t2 where t2.item_id = t.item_id and t2."Timestamp" > t."Timestamp" ) as end_timestamp
from transition_table t )
--finally, the actual query is straightforward
select days.d
, state_list.s as item_state
, count(distinct t.item_id) as items_in_state_at_some_point_in_day
from days
join state_list on TRUE --full join
left join time_in_state t on t.item_state = state_list.s and days.d >= date_trunc('day', t.start_timestamp) and days.d < coalesce(t.end_timestamp, now() )
group by days.d, state_list.s
答案 1 :(得分:0)
您是否要查找按日期和状态汇总数据的Postgres查询?
取决于状态的计算方法,它应该类似于:
SELECT
t.from_state,
t.timestamp::date as day,
COUNT(*) as item_count
FROM mytable t
GROUP BY
t.from_state,
t.timestamp::date