SQL查询可从状态更改数据获取每日快照

时间:2019-01-27 23:47:59

标签: sql postgresql

我有一组状态更改数据绑定到一堆物品上[Trello卡及其状态更改是特定的]。我希望采用这组转换[Item_id,From_state,To_state,Timestamp],并为每个状态生成一组看起来像[State,Day,Item Count]的数据。

当前,在返回所有转换并汇总它们之后,我正在从Python中以相当占用CPU的方式构建此列表。我一直在寻找在PSQL中执行此操作的更快方法。

2 个答案:

答案 0 :(得分:1)

使用[Item_id,From_state,To_state,Timestamp],您需要进行大量工作来计算快照,但是如果您拥有这样的数据,它将非常简单:[Item_id,state,start_timestamp,end_timestamp]

幸运的是,有可能从一种格式过渡到另一种格式:

对于这种问题,我发现最简单的方法是:

  • 生成日期列表
  • 生成感兴趣的州列表(您需要此列表,因为在某个特定日期,某州的某天可能会有零张卡片, 并且您大概想要一行说零而不是没有行)
  • 将数据转换为[Item_id,state,start_timestamp,end_timestamp]格式
  • 每天计算每种状态下有多少个物品

请记住,遵循以下模式的某些东西应该起作用。

--CTE for step 1
with days as (SELECT day::date as d
FROM   generate_series(timestamp '2004-03-07'
                     , timestamp '2004-08-16'
                     , interval  '1 day') day)
--CTE FOR step 2                     
, state_list as (select from_state as s from transition_table t group by from_state)
--CTE for step 3
, time_in_state as( select t.item_id, t.to_state as item_state, t."Timestamp" as start_timestamp
   , (select min(t2."Timestamp") from transition_table t2 where t2.item_id = t.item_id and t2."Timestamp" > t."Timestamp" ) as end_timestamp
  from transition_table t )

--finally, the actual query is straightforward
select days.d
 , state_list.s as item_state
 , count(distinct t.item_id) as items_in_state_at_some_point_in_day
from days
 join state_list on TRUE --full join
 left join time_in_state t on t.item_state = state_list.s and days.d >= date_trunc('day', t.start_timestamp) and days.d < coalesce(t.end_timestamp, now() )

 group by days.d, state_list.s

答案 1 :(得分:0)

您是否要查找按日期和状态汇总数据的Postgres查询?

取决于状态的计算方法,它应该类似于:

SELECT 
    t.from_state,
    t.timestamp::date as day,
    COUNT(*) as item_count
FROM mytable t
GROUP BY 
    t.from_state, 
    t.timestamp::date