如何使用仅追加行计算运行总和

时间:2019-07-27 16:04:42

标签: sql presto

我有一张表,其中行从不被突变,而仅被插入;它们是不可变的记录。它具有以下字段:

  • idint
  • user_idint
  • createddatetime
  • is_coolboolean
  • likes_fruitsboolean

对象与用户相关联,给定用户的“当前”对象是具有最新created日期的对象。例如。如果我想为用户更新is_cool,则将记录添加新的created时间戳和is_cool=true

我想计算每天结束时is_cool的用户数。即我希望输出表具有以下列:

  • day:某种date_trunc('day', created)
  • cool_users_count:今天结束时拥有is_cool的用户数。

我可以编写哪些SQL查询来执行此操作? FWIW我正在使用Presto(或在需要时使用Redshift)。

请注意,还有其他列,例如likes_fruits,这意味着is_coolfalse的记录并不意味着is_cool刚被更改为false-可能是false一会儿。

这是程序化伪代码想要代表我想要在SQL中执行的操作:

// rows = ...
min_date = min([row.created for row in rows])
max_date = max([row.created for row in rows])

counts_by_day = {}
for date in range(min_date, max_date):
  rows_up_until_date = [row for row in rows if row.created <= date]
  latest_row_by_user = rows_up_until_date.reduce(
    {},
    (acc, row) => acc[row.user_id] = row,
  )
  counts_by_day[date] = latest_row_by_user.filter(row => row.is_cool).length

2 个答案:

答案 0 :(得分:0)

您可以使用查询来执行此操作..尝试在boolend上使用总和并按

  select  date(created), sum(is_cool)
  from  my_table  
  group by date(created)

或者如果您需要用户数

select t.date_created, count(*) num_user
from  (
  select  distinct date(created) date_created, user_id 
  from  my_table  
  where is_cool = TRUE 
 ) t 
 group by  t.date_created

或者如果需要is_cool的最后一个值

select date(max_date), sum(is_cool)
from (
    select  t.user_id, t.max_date, m.is_cool, m.user_id 
    from my_table m  
    inner join  (
        select  max(date_created) max_date, user_id 
        from  my_table 
        group by  user_id, date(date_created)
    ) t on t.max_date  = m.date_created 
            and t.user_id  = m.user_id 
    where m.is_cool = TRUE 
) t2
group by date(max_date)

答案 1 :(得分:0)

相关子查询可能是最简单的解决方案。以下是每个日期的每个用户的is_cool值:

select u.user_id, d.date,
       (select t.is_cool
        from t
        where t.user_id = u.user_id and
              t.created < dateadd(day, 1, d.date)
        order by t.created desc
        limit 1
       ) as is_cool
from (select distinct date(created) as date
      from t
     ) d cross join
     (select distinct user_id
      from t
     ) u ;

然后合计:

select date, sum(is_cool)
from (select u.user_id, d.date,
             (select t.is_cool
              from t
              where t.user_id = u.user_id and
                    t.created < dateadd(day, 1, d.date)
              order by t.created desc
              limit 1
             ) as is_cool
      from (select distinct date(created) as date
            from t
           ) d cross join
           (select distinct user_id
            from t
           ) u
     ) ud
group by date;