我有一张表,其中行从不被突变,而仅被插入;它们是不可变的记录。它具有以下字段:
id
:int
user_id
:int
created
:datetime
is_cool
:boolean
likes_fruits
:boolean
对象与用户相关联,给定用户的“当前”对象是具有最新created
日期的对象。例如。如果我想为用户更新is_cool
,则将记录添加新的created
时间戳和is_cool=true
。
我想计算每天结束时is_cool
的用户数。即我希望输出表具有以下列:
day
:某种date_trunc('day', created)
cool_users_count
:今天结束时拥有is_cool
的用户数。我可以编写哪些SQL查询来执行此操作? FWIW我正在使用Presto(或在需要时使用Redshift)。
请注意,还有其他列,例如likes_fruits
,这意味着is_cool
是false
的记录并不意味着is_cool
刚被更改为false
-可能是false
一会儿。
这是程序化伪代码想要代表我想要在SQL中执行的操作:
// rows = ...
min_date = min([row.created for row in rows])
max_date = max([row.created for row in rows])
counts_by_day = {}
for date in range(min_date, max_date):
rows_up_until_date = [row for row in rows if row.created <= date]
latest_row_by_user = rows_up_until_date.reduce(
{},
(acc, row) => acc[row.user_id] = row,
)
counts_by_day[date] = latest_row_by_user.filter(row => row.is_cool).length
答案 0 :(得分:0)
您可以使用查询来执行此操作..尝试在boolend上使用总和并按
select date(created), sum(is_cool)
from my_table
group by date(created)
或者如果您需要用户数
select t.date_created, count(*) num_user
from (
select distinct date(created) date_created, user_id
from my_table
where is_cool = TRUE
) t
group by t.date_created
或者如果需要is_cool的最后一个值
select date(max_date), sum(is_cool)
from (
select t.user_id, t.max_date, m.is_cool, m.user_id
from my_table m
inner join (
select max(date_created) max_date, user_id
from my_table
group by user_id, date(date_created)
) t on t.max_date = m.date_created
and t.user_id = m.user_id
where m.is_cool = TRUE
) t2
group by date(max_date)
答案 1 :(得分:0)
相关子查询可能是最简单的解决方案。以下是每个日期的每个用户的is_cool
值:
select u.user_id, d.date,
(select t.is_cool
from t
where t.user_id = u.user_id and
t.created < dateadd(day, 1, d.date)
order by t.created desc
limit 1
) as is_cool
from (select distinct date(created) as date
from t
) d cross join
(select distinct user_id
from t
) u ;
然后合计:
select date, sum(is_cool)
from (select u.user_id, d.date,
(select t.is_cool
from t
where t.user_id = u.user_id and
t.created < dateadd(day, 1, d.date)
order by t.created desc
limit 1
) as is_cool
from (select distinct date(created) as date
from t
) d cross join
(select distinct user_id
from t
) u
) ud
group by date;