Best method updating sandbox tables with production tables/views

时间:2018-03-07 13:26:04

标签: sql-server development-environment production-environment insert-update

Using SQL, it is taking over 4 hours every evening to pull over all the data from the twelve Production database tables or views needed for our Sandbox database. There has to be a significantly more efficient and effective manner to get this data into our Sandbox.

Currently, I'm creating a UID (Unique ID) by concatenating the views Primary Keys and system date fields.

The UID is used in two steps:

Step 1. INSERT INTO Sandbox WHERE UID IS NULL and only Looking back the Last 30 Days based on the System Date (using Left Join the Production Table/View.UID to the Existing Sandbox Table/View.UID)

Step 2. UPDATE Sandbox Where Production.UID = Sandbox.UID (using an Inner Join of the Production Table/View.UID to the Existing Sandbox Table/View.UID)

I've cut the 4 hour time down to 2 hours, but it feels like this process I've created is missing a (big) step.

How can I cut this time down? Should I put a 30 day filter on my UPDATE statement as well?

2 个答案:

答案 0 :(得分:0)

假设您没有将数十亿行移动到开发环境中,我只想创建一个简单的ETL策略来截断开发环境并从生产中完全加载。如果您不想要完整数据集,请为ETL的源查询添加过滤器。只要确保这对数据的完整性没有任何影响。

如果您的数据数十亿,您可能已经有了企业存储解决方案。其中许多人可以处理将数据文件快照到另一个位置。这种方法有一些安全方面,你也需要考虑。

答案 1 :(得分:0)

我找到了一个分为两部分的答案。它可能不是最好的解决方案,但它似乎暂时正在发挥作用。

  1. 我可以使用主键作为生产箱数据库表中的UID(大部分)。使用30-90天过滤器更新它们
  2. 这些视图有点棘手,因为它们结合了两个精确的表并且具有重复的主键。因此,我创建了自己的uid连接多个主键字段并使用30-90天过滤器进行更新。
  3. 之前的过程最多需要4个多小时才能完成。新流程在一小时内完成,目前似乎正在运作。