如何从关系源填充Dim_tbls?
给出了这些示例表:
tbl_sales: id_sales, fk_id_customer, fk_id_product, country, timestamp
tbl_customer: id_customer, name, adress, zip, city
tbl_product: id_product, price, product
我的目标是将这些属性放入start-schema中。我遇到的问题是加载维度表背后的逻辑。我的意思是,我会在Dim_Product中加载哪些数据? tbl_product中的所有产品?但是,我怎么知道特定产品的销售量是多少?
我想做的分析是:
How many people bought product x.
How many sales are made from city x.
How many sales were made between Time x and y.
示例数据:
tbl_sales: id_sales | fk_id_customer | fk_id_product | country | timestamp
1 | 2 | 1 | UK | 19.11.2013 10:23:22
2 | 1 | 2 | FR | 20.11.2013 06:04:22
tbl_customer: id_customer | name | adress | zip | city
1 | Frank|Street X| 211 | London
2 | Steve|Street Y| 431 | Paris
tbl_customer: id_product| Price | product
1 | 100,00| Hammer
2 | 50,00| Saw
答案 0 :(得分:2)
让我们从一个非常简单的星型模式模型开始;例如,我认为你不必担心处理尺寸变化问题。属性。
DateKey
CustomerKey
ProductKey
Counter (=1; this is a factless fact table)
DateKey
Date
Year
Quarter
Month
...
CustomerKey
Name
Address
Zip
City
ProductKey
Name
Price (if it changes, you need move it to factSales)
有多少人购买了产品x。
SELECT DISTINCT CustomerKey
FROM factSales
WHERE ProductKey IN ( SELECT ProductKey
FROM dimProduct
WHERE Name = 'Product X' )
从x城市进行了多少次销售。
SELECT SUM(Counter)
FROM factSales
WHERE CustomerKey IN ( SELECT CustomerKey
FROM dimCustomer
WHERE City = 'City X' )
在时间x和y之间进行了多少次销售。
SELECT SUM(Counter)
FROM factSales
WHERE DateKey IN ( SELECT DateKey
FROM dimDate
WHERE Date BETWEEN DateX AND DateY )