Question

我对整个Lambda，AWS，步骤功能和Redshift都很陌生。但我想我已经找到了一个让我调查的问题。

step函数调用lambda节点js代码来执行从S3到Redshift的复制。

相关步骤定义如下所示

"States": {
...
            "CopyFiles": {
                "Type": "Task",
                "Resource": "ARN:activity:CopyFiles",
                "ResultPath": "...",
                "Retry": [
                    {
                        "ErrorEquals": ["Error"],
                        "MaxAttempts": 0
                    },
                    {
                        "ErrorEquals": [
                            "States.ALL"
                        ],
                        "IntervalSeconds": 60,
                        "BackoffRate": 2.0,
                        "MaxAttempts": 3
                    }
                ],
                "Catch": [
                    {
                        "ErrorEquals": [
                            "States.ALL"
                        ],
                        "ResultPath": "$.errorPath",
                        "Next": "ErrorStateHandler"
                    }
                ],
                "Next": "SuccessStep"
            },
            "SuccessStep": {
                "Type": "Task",
                "Resource": "ARN....",
                "ResultPath": null,
                "Retry": [
                    {
                        "ErrorEquals": ["Error"],
                        "MaxAttempts": 0
                    },
                    {
                        "ErrorEquals": [
                            "States.ALL"
                        ],
                        "IntervalSeconds": 60,
                        "BackoffRate": 2.0,
                        "MaxAttempts": 3
                    }
                ],
                "End": true
            },

SQL语句（在CopyFiles活动中使用）由

包装在事务中

"BEGIN;
CREATE TABLE "tempTable_datetimestamp_here" (LIKE real_table);
COPY tempTable_datetimestamp_here from 's3://bucket/key...' IGNOREHEADER 1 COMPUPDATE OFF STATUPDATE OFF';
DELETE FROM toTable
    USING tempTable_datetimestamp_here
    WHERE toTable.index = tempTable_datetimestamp_here.index;
INSERT INTO toTable SELECT * FROM tempTable_datetimestamp_here;

END;

当我同时输入多个文件（50）时，所有步骤功能都会挂起（一直运行直到我中止），请看截图。如果我放一个文件然后它工作正常。

select pid, trim(starttime) as start,
duration, trim(user_name) as user,
query as querytxt
from stv_recents
where status = 'Running';

不再返回任何内容。但是，步进功能仍显示为＆＃34; Running＆＃34;。

有人请告诉我我需要做些什么才能让这个工作？谢谢添

Answer 1

这种方法（50个并发的小提交）可以在OLTP（小精确查询）数据库（例如Postgres，MySQL）上正常工作。

但是概述的流程会在Redshift中创建多个相互阻止或相互冲突的竞争提交。

Redshift是为OLAP（大型分析查询）而设计的，Redshift中的提交相对较贵，因为它们必须在返回之前得到所有计算节点的确认。

我建议采用两阶段过程：

使用Lambda创建清单（JSON文件），列出当前可用的加载文件（例如50个文件）。
将COPY与清单一起使用以并行加载所有可用文件并处理事务一次。

https://docs.aws.amazon.com/redshift/latest/dg/loading-data-files-using-manifest.html

在事务中Upsert到Redshift

1 个答案: