如何最好地存储用户数据与日期/时间维度? Usecase是我试图每小时,每小时存储用户操作。如股票,喜欢,朋友等的数量。我有时间表和日期表。时间很容易 - 我每天的每一小时都有每行= user_id和colunms = 1到24。但问题是约会。如果我给每天= 1个colunm那么我每年将有365个colunms。我无法归档数据方式,因为分析也需要过去的数据。其他策略是什么?
答案 0 :(得分:5)
dimDate : 1 row per date
dimTime : 1 row per minute
一开始你必须说明事实表的“ grain ”,然后坚持。
如果谷物是一天,那么TimeKey
总是指向“23:59”的键。
如果谷物是一小时,那么TimeKey
指向“HH:59”的条目。
如果谷物是一分钟,则TimeKey
指向相应的“HH:MM”
如果谷物是15分钟,则TimeKey
指向相应的“HH:14”,“HH:29”,“HH:44”,“HH:59”
等等......
-- How many new friends did specific user gain
-- in first three months of years 2008, 2009 and 2010
-- between hours 3 and 5 in the morning
-- by day of week
-- not counting holidays ?
select
DayOfWeek
, sum(NewFriends) as FriendCount
from factUserAction as f
join dbo.dimUser as u on u.UserKey = f.UserKey
join dbo.dimDate as d on d.DateKey = f.DateKey
join dbo.dimTime as t on t.TimeKey = f.TimeKey
where CalendarYear between 2008 and 2010
and MonthNumberInYear between 1 and 3
and t.Hour between 3 and 5
and d.IsHoliday = 'no'
and UserEmail = 'john_doe@gmail.com'
group by DayOfWeek
order by DayOfWeek ;
答案 1 :(得分:1)
您可以在维度中存储日期,然后添加计算字段,例如day_of_year。
关于我所做过的设计,我们从来没有比白天更精细的时间片,但是我不明白为什么人们不能像日期时间那样有时间维度? / p>
user_activity_facts(
time_key references time_dimension(time_key)
,user_key references user_dimension(user_key)
,measure1
,measure2
,measure3
,primary key(time_key, user_key)
)
partition by range(time_key)(
...
)