我必须执行一个相对非常大的表(80M记录)Invoice_Payment
的更新。它应该更新另一个表Invoice_Payment_updated
中的数据,该表是行计数中Invoice_Payment
的10%-15%。
为了说明,请查看以下演示表:
Invoice_Payment Invoice_Payment_updated
--------------- -----------------------
Customer_id Invoice_no Id Cust_id Invoice_no
10 10100001 1 10 20200100
11 10100002 2 11 20200101
12 10100003
13 10100004
我知道Merge通常用于执行UPSERT,执行时间比等效的Update语句要长几倍。但相比之下,在某些情况下,具有多个子查询的正常更新语句会降低性能。
MERGE INTO Invoice_Payment ip
USING (SELECT ipu.Cust_id, ipu.Invoice_no from Invoice_Payment_updated ipu
INNER JOIN Invoice_Payment ip ON ip.Customer_id = ipu.Cust_id
WHERE ipu.Cust_id = ip.Customer_id and ipu.Invoice_no <> ip.Invoice_no) t
ON (ip.Customer_id = t.Cust_id)
WHEN MATCHED THEN
UPDATE SET ip.Invoice_no = t.Invoice_no;
为了提高性能,我可以使用ROWCOUNT批量更新,但这不会加快执行速度,只会减少整体锁定。
遵循返回相同输出的简单Update语句:
UPDATE Invoice_Payment
SET Invoice_no = (SELECT ipu.Invoice_no
FROM Invoice_Payment_updated ipu
WHERE ipu.Cust_id = Invoice_Payment.Customer_id
AND ipu.Invoice_no <> Invoice_Payment.Invoice_no)
WHERE EXISTS (SELECT 1
FROM Invoice_Payment_updated ipu
WHERE ipu.Cust_id = Invoice_Payment.Customer_id
AND ipu.Invoice_no <> Invoice_Payment.Invoice_no);
使用SQL Merge和Update的想法非常聪明,但我听说当我需要在一个大而宽的表中更新许多记录(即超过75M)时,它们都会出现性能问题。此外,重新创建完整的表是很多IO负载,更不用说它会占用大量空间,因为使用子查询,基本上会暂时存储表几次。
使用临时表解决此问题的另一种方法:
CREATE TABLE tmp (
Cust_id int,
Invoice_no int);
INSERT INTO tmp_stage VALUES
(SELECT ipu.Cust_id, ipu.Invoice_no FROM Invoice_Payment_updated ipu
INNER JOIN Invoice_Payment ip ON ip.Customer_id = ipu.Cust_id
WHERE ipu.Cust_id = ip.Customer_id and ipu.Invoice_no <> ip.Invoice_no);
UPDATE (SELECT tmp.Cust_id, ip.Customer_id, tmp.Invoice_no, tgt.Invoice_no
FROM tmp INNER JOIN Invoice_Payment ip
ON tmp.Cust_id = ip.Customer_id)
SET tmp.Invoice_no = ip.Invoice_no;
我想弄清楚在多个子查询的情况下最好使用哪一个?
欢迎任何想法,非常感谢原始问题的完全不同的解决方案。
答案 0 :(得分:2)
UPDATE i
SET i.Invoice_no = io.Invoice_no
FROM Invoice_Payment i
INNER JOIN Invoice_Payment_updated io on i.Customer_id = io.cust_id
WHERE i.Invoice_no <> iu.Invoice_no -- assuming Invoice_no cannot be NULL
如果更新花费太多时间,请添加WHILE
循环和update TOP (10000)
直到@@ROWCOUNT = 0
。批处理模式可以提高性能。