我正在为特定客户的公司(我正在为其工作)构建一个数据仓库核心ERP应用程序。
在源数据库中,数据仓库中的大多数维度信息都以不透露的方式存储,因为应用程序是根据客户端请求自定义的产品。
对于我正在使用的当前客户端,我可以取消文本并提取数据。但我担心的是,如果我们要重用数据仓库(与其他客户一起),那么我认为根据他们对字段进行分类的方式,数据仓库模型将无法调整,进一步的定制将需要。
请告诉我是否有任何合格的机制来克服这个设计问题。
以下是产品在源数据库中的分类方式示例(这也适用于大多数其他主数据分类),
Product Code MasterClassification MasterClassificationValue
------------ -------------------- -------------------------
AAA Brand AA
AAA Category A
相同的数据集:
Product Code Brand Category
------------ ----- --------
AAA AA A
提前致谢。
答案 0 :(得分:1)
这是一个经典且记录良好的数据问题。你所描述的'未透露'被称为EAV。我建议你谷歌'EAV'预告与'报道'一起。你并不孤单!
答案 1 :(得分:0)
It makes sense that the dimensional data in the source system is stored is unpivoted -- it's a database, so it should be normalized. How you handle it in the data warehouse is another question.
In a previous job, we debated whether and how we should carry pivoted / denormalized / "wide and shallow" data. In our implementation, every table brought with it a view (containing the ETL logic) and a procedure (to load the table). That's a lot of infrastructure, so we thought twice before adding another table. Also, the requirement for pivoted data often came from the analytics team for use in Tableau, a tool that easily consumes unpivoted / "narrow and deep" data and pivots it -- so we often debated whether pivoted data was actually required.
Eventually we decided that we would occasionally carry pivoted data but only via a reporting view. (We had naming conventions to distinguish reporting views from ETL views.) I think this is an approach you should consider, for reasons you mentioned yourself: new categories could be added, rendering your pivoted design outdated. Also, if you have multiple clients using this data, each client could be interested in a different set of categories. You could cast a customized pivoted reporting view on top of this table for each client. That sounds like a lot of work, but I think it's less work than redoing a pivoted table every time you become aware that a new category has been added. Good luck!