我需要使用基于与第三个表的连接的表中的键来更新一个巨大的表,> 10亿条记录(POS数据)。我可以根据日期打破更新,因为这可以追溯到几年前。 我基本上需要将表edw.f_pos_daily中的f.retail_sku_key替换为dedup.retail_sku_key,当它们不相同时。 谢谢!
select F.POS_KEY, f.retail_sku_key , dedup.retail_sku_key dedup_key
from edw.f_pos_daily f,edw.d_retail_sku sku, edw.d_retail_sku_new dedup
where f.retail_sku_key = sku.retail_sku_key
and sku.retail_sku = dedup.retail_sku
and sku.mtd_item_number = dedup.mtd_item_number
and sku.retailer = dedup.retailer
and f.retail_sku_key <> dedup.retail_sku_key
答案 0 :(得分:0)
虽然可能是UPDATE等价物,但我更喜欢在SQL语句驱动需要更新的行时使用MERGE 和生成要同时更新的值。< / p>
那么,这样的事情呢? (我假设f.pos_key是f_pos_daily表上的唯一标识符。如果不是这种情况,并且查询为同一个f_pos_key值返回多行,则会失败。)
MERGE INTO edw.f_pos_daily f_main
USING (
select f.pos_key -- this is for joining back to the rows that need to be updated...
, dedup.retail_sku_key dedup_key -- ...and this is the value to update them with
from edw.f_pos_daily f
, edw.d_retail_sku sku
, edw.d_retail_sku_new dedup
where f.retail_sku_key = sku.retail_sku_key
and sku.retail_sku = dedup.retail_sku
and sku.mtd_item_number = dedup.mtd_item_number
and sku.retailer = dedup.retailer
and f.retail_sku_key <> dedup.retail_sku_key
) qry
ON (f_main.pos_key = qry.pos_key)
WHEN MATCHED THEN
UPDATE SET f_main.retail_sku_key = qry.dedup_key
;
如果您确实需要将其分解为单独的更新,您可以通过两种方式分享:
1)在内部查询中隔离f_pos_daily中的分区(假设该表由除retail_sku_key之外的其他内容分区),例如FROM edw.f_pos_daily PARTITION (p_some_partition_name)
并为每个分区运行上述语句
2)生成要更新的行范围(同样,使用f_pos_key = unique假设),这些行将更新,例如,每个行的10%:
SELECT MIN(f_pos_key) c0,
PERCENTILE_DISC(0.1) WITHIN GROUP (ORDER BY f_pos_key) p1,
PERCENTILE_DISC(0.2) WITHIN GROUP (ORDER BY f_pos_key) p2,
PERCENTILE_DISC(0.3) WITHIN GROUP (ORDER BY f_pos_key) p3,
PERCENTILE_DISC(0.4) WITHIN GROUP (ORDER BY f_pos_key) p4,
PERCENTILE_DISC(0.5) WITHIN GROUP (ORDER BY f_pos_key) p5,
PERCENTILE_DISC(0.6) WITHIN GROUP (ORDER BY f_pos_key) p6,
PERCENTILE_DISC(0.7) WITHIN GROUP (ORDER BY f_pos_key) p7,
PERCENTILE_DISC(0.8) WITHIN GROUP (ORDER BY f_pos_key) p8,
PERCENTILE_DISC(0.9) WITHIN GROUP (ORDER BY f_pos_key) p9,
MAX(f_pos_key) c4
FROM edw.f_pos_daily;
如果值介于0和1000之间(以及某些未知行数),这将为您提供如下输出:
P0 P1 P2 P3 P4 P5 P6 P7 P8 P9 P10
0 104 183 319 402 512 607 723 810 914 1000
从这里你只需要在子查询中包含另一个条件
AND f.pos_key BETWEEN 0 AND 104
在第一次运行时
AND f.pos_key BETWEEN 105 AND 183
第二次运行,依此类推。