使用Spark 1.6.2。
这里有数据:
day | visitorID
-------------
1 | A
1 | B
2 | A
2 | C
3 | A
4 | A
我想计算前一天有多少不同的访客(前一天+累积)(我不知道确切的用语,对不起)。
这应该给出:
day | visitors
--------------
1 | 2 (A+B)
2 | 3 (A+B+C)
3 | 3
4 | 3
答案 0 :(得分:2)
你应该可以这样做:
select day, max(visitors) as visitors
from (select day,
count(distinct visitorId) over (order by day) as visitors
from t
) d
group by day;
实际上,我认为更好的方法是仅在出现的第一天记录访问者:
select startday, sum(count(*)) over (order by startday) as visitors
from (select visitorId, min(day) as startday
from t
group by visitorId
) t
group by startday
order by startday;
答案 1 :(得分:2)
在SQL中,你可以这样做。
select t1.day,sum(max(t.cnt)) over(order by t1.day) as visitors
from tbl t1
left join (select minday,count(*) as cnt
from (select visitorID,min(day) as minday
from tbl
group by visitorID
) t
group by minday
) t
on t1.day=t.minday
group by t1.day
min
获取访客ID的第一天。 另一种方法是
select t1.day,sum(count(t.visitorid)) over(order by t1.day) as cnt
from tbl t1
left join (select visitorID,min(day) as minday
from tbl
group by visitorID
) t
on t1.day=t.minday and t.visitorid=t1.visitorid
group by t1.day
答案 2 :(得分:0)
试试吧
redis.conf