用户数据的数据仓库 - 设计Q.

时间:2011-02-09 17:33:24

标签: database-design data-warehouse

如何最好地存储用户数据与日期/时间维度? Usecase是我试图每小时,每小时存储用户操作。如股票,喜欢,朋友等的数量。我有时间表和日期表。时间很容易 - 我每天的每一小时都有每行= user_id和colunms = 1到24。但问题是约会。如果我给每天= 1个colunm那么我每年将有365个colunms。我无法归档数据方式,因为分析也需要过去的数据。其他策略是什么?

2 个答案:

答案 0 :(得分:5)

enter image description here

dimDate : 1 row per date
dimTime : 1 row per minute

一开始你必须说明事实表的“ grain ”,然后坚持

如果谷物是一天,那么TimeKey总是指向“23:59”的键。

如果谷物是一小时,那么TimeKey指向“HH:59”的条目。

如果谷物是一分钟,则TimeKey指向相应的“HH:MM”

如果谷物是15分钟,则TimeKey指向相应的“HH:14”,“HH:29”,“HH:44”,“HH:59”

等等......

-- How many new friends did specific user gain
-- in first three months of years 2008, 2009 and 2010
-- between hours 3 and 5 in the morning
-- by day of week
-- not counting holidays ?

select
      DayOfWeek
    , sum(NewFriends) as FriendCount
from factUserAction as f
join dbo.dimUser    as u on u.UserKey = f.UserKey
join dbo.dimDate    as d on d.DateKey = f.DateKey
join dbo.dimTime    as t on t.TimeKey = f.TimeKey
where CalendarYear between 2008 and 2010
  and MonthNumberInYear between 1 and 3
  and t.Hour between 3 and 5
  and d.IsHoliday = 'no'
  and UserEmail = 'john_doe@gmail.com' 
group by DayOfWeek
order by DayOfWeek ;

答案 1 :(得分:1)

您可以在维度中存储日期,然后添加计算字段,例如day_of_year。

关于我所做过的设计,我们从来没有比白天更精细的时间片,但是我不明白为什么人们不能像日期时间那样有时间维度? / p>

user_activity_facts(
   time_key references time_dimension(time_key)
  ,user_key references user_dimension(user_key)
  ,measure1
  ,measure2
  ,measure3
  ,primary key(time_key, user_key)
)
partition by range(time_key)(
   ...
)