Using SQL, it is taking over 4 hours every evening to pull over all the data from the twelve Production database tables or views needed for our Sandbox database. There has to be a significantly more efficient and effective manner to get this data into our Sandbox.
Currently, I'm creating a UID (Unique ID) by concatenating the views Primary Keys and system date fields.
The UID is used in two steps:
Step 1.
INSERT INTO Sandbox
WHERE UID IS NULL
and only Looking back the Last 30 Days based on the System Date
(using Left Join the Production Table/View.UID to the Existing Sandbox Table/View.UID)
Step 2.
UPDATE Sandbox
Where Production.UID = Sandbox.UID
(using an Inner Join of the Production Table/View.UID to the Existing Sandbox Table/View.UID)
I've cut the 4 hour time down to 2 hours, but it feels like this process I've created is missing a (big) step.
How can I cut this time down? Should I put a 30 day filter on my UPDATE statement as well?
答案 0 :(得分:0)
假设您没有将数十亿行移动到开发环境中,我只想创建一个简单的ETL策略来截断开发环境并从生产中完全加载。如果您不想要完整数据集,请为ETL的源查询添加过滤器。只要确保这对数据的完整性没有任何影响。
如果您的数据数十亿,您可能已经有了企业存储解决方案。其中许多人可以处理将数据文件快照到另一个位置。这种方法有一些安全方面,你也需要考虑。
答案 1 :(得分:0)
我找到了一个分为两部分的答案。它可能不是最好的解决方案,但它似乎暂时正在发挥作用。
之前的过程最多需要4个多小时才能完成。新流程在一小时内完成,目前似乎正在运作。