Question

我发现this AWS documentation提出了如何在Redshift中执行合并向上插入的建议。

后查询中的SQL状态为"...where stage_table.id = target_table.id"。我假设id是这里的主键。

如果您要导入非结构化数据（例如，使用不带PK的（字符串，结构，数组，字符串）架构）并且正在逐步处理Spark数据帧怎么办？即使每次添加一个递增的ID，此ID联接也将不起作用，因为它始终从1开始。

Redshift现在也有一个'append'选项，所以我想知道在SQL（文档link here）中简单地使用append是否更好？

示例代码（假设rules_dyf是Glue中的增量数据帧）。

pre_query = "drop table if exists public.stage_table; create table public.stage_table as select * from public.target_table limit 1; truncate public.stage_table"
post_query = "alter table public.target_table append from public.stage_table; drop table public.stage_table;"

datasink4 = glueContext.write_dynamic_frame.from_jdbc_conf(frame = rules_dyf, catalog_connection = "redshift-dw", connection_options = {"preactions":pre_query, "dbtable": "stage_table", "database": "redshift_dw", "postactions":post_query}, redshift_tmp_dir = "s3://redshiftlogs", transformation_ctx = "datasink4")

AWS Glue增量加载到Redshift问题中

0 个答案: