Question

我的用例是从x个Lambda函数中实时地增量提取，转换和加载数据。我希望多个Lambda函数可以同时运行，而Redshift可以为读取查询保持活动状态。

由于Redshift不强制执行主键约束，因此我使用的是AWS文档Merge examples - Example of a merge that replaces existing rows来强制执行唯一行。当只有1个lambda函数实例运行时，此方法可以正常工作。

-- Start a new transaction
begin transaction;

-- Delete any rows from SALES that exist in STAGESALES, because they are updates
-- The join includes a redundant predicate to collocate on the distribution key 
-- A filter on saletime enables a range-restricted scan on SALES

delete from sales
using stagesales
where sales.salesid = stagesales.salesid
and sales.listid = stagesales.listid
and sales.saletime > '2008-11-30';

-- Insert all the rows from the staging table into the target table
insert into sales
select * from stagesales;

-- End transaction and commit
end transaction;

-- Drop the staging table
drop table stagesales;

但是，只要同时运行1个lambda函数并访问同一表，我就会收到：

"ERROR: 1023 DETAIL: Serializable isolation violation on table in Redshift" when performing operations in a transaction concurrently with another session.

如何修改此示例以使其在并发环境中运行？

Answer 1

您遇到的问题是，您有多个lambda函数在同一张表上同时执行DML。 Redshift不支持不是serializable的并发事务，即尝试同时修改相同数据的事务。在这种情况下，Redshift将中止一个或多个事务，以确保所有执行的DML都是可序列化的。

由于Redshift的工作方式存在这些限制，因此当缩放到多个lambda函数时，您当前的设计将无法正常工作。您将需要设计一种管理lambda函数的方法，以使没有冲突的DML语句在同一表上同时运行。目前尚不清楚为什么要使用多个lambda函数来执行此操作，因此我无法评论替代方案的外观。

Answer 2

您是否尝试在每个代码中锁定表，因为这将不允许其他事务修改数据？所有人都可以拥有用于不同lambda的单独的暂存表，并具有按比例运行的合并作业，该合并作业将来自它们的数据合并并合并到最终表。

Answer 3

1023是可重试的错误。如果只是偶尔发生，您可以考虑在lambda函数中捕获它，然后再次提交查询。

Redshift：如何解决由并发MERGE操作引起的可序列化隔离冲突（1023）？

3 个答案: