数据仓库与不透明的数据

时间:2011-06-09 11:41:40

标签: sql-server data-warehouse business-intelligence

我正在为特定客户的公司(我正在为其工作)构建一个数据仓库核心ERP应用程序。

在源数据库中,数据仓库中的大多数维度信息都以不透露的方式存储,因为应用程序是根据客户端请求自定义的产品。

对于我正在使用的当前客户端,我可以取消文本并提取数据。但我担心的是,如果我们要重用数据仓库(与其他客户一起),那么我认为根据他们对字段进行分类的方式,数据仓库模型将无法调整,进一步的定制将需要。

请告诉我是否有任何合格的机制来克服这个设计问题。

以下是产品在源数据库中的分类方式示例(这也适用于大多数其他主数据分类),

Product Code  MasterClassification  MasterClassificationValue
------------  --------------------  -------------------------
AAA           Brand                 AA
AAA           Category              A

相同的数据集:

Product Code  Brand  Category
------------  -----  --------
AAA           AA     A

提前致谢。

2 个答案:

答案 0 :(得分:1)

这是一个经典且记录良好的数据问题。你所描述的'未透露'被称为EAV。我建议你谷歌'EAV'预告与'报道'一起。你并不孤单!

答案 1 :(得分:0)

It makes sense that the dimensional data in the source system is stored is unpivoted -- it's a database, so it should be normalized. How you handle it in the data warehouse is another question.

In a previous job, we debated whether and how we should carry pivoted / denormalized / "wide and shallow" data. In our implementation, every table brought with it a view (containing the ETL logic) and a procedure (to load the table). That's a lot of infrastructure, so we thought twice before adding another table. Also, the requirement for pivoted data often came from the analytics team for use in Tableau, a tool that easily consumes unpivoted / "narrow and deep" data and pivots it -- so we often debated whether pivoted data was actually required.

Eventually we decided that we would occasionally carry pivoted data but only via a reporting view. (We had naming conventions to distinguish reporting views from ETL views.) I think this is an approach you should consider, for reasons you mentioned yourself: new categories could be added, rendering your pivoted design outdated. Also, if you have multiple clients using this data, each client could be interested in a different set of categories. You could cast a customized pivoted reporting view on top of this table for each client. That sounds like a lot of work, but I think it's less work than redoing a pivoted table every time you become aware that a new category has been added. Good luck!