Question

在Amazon Redshift中，我尝试从临时表中对表进行批量插入。但是，我只想在表中不存在值复合（主键）的地方插入值，以避免添加重复项。

在表的DDL下面

•clusters_typologies表（我要插入数据时的表）

create table if not exists clusters.clusters_typologies
(
    cluster_id  BIGINT,
    typology_id BIGINT,
    semantic_id BIGINT,
    primary key (cluster_id, typology_id, semantic_id)
);

使用下面的查询创建临时表，然后正确插入所有字段。

CREATE TEMPORARY TABLE temporary (
  cluster_id   bigint,
  typology_name varchar(100),
  typology_id   bigint,
  semantic_name varchar(100),
  semantic_id   bigint
);

现在，当我尝试使用该查询插入

INSERT INTO clusters.clusters_typologies (cluster_id, typology_id,semantic_id)
    (SELECT temp.cluster_id, temp.typology_id, temp.semantic_id
     FROM temporary temp
     WHERE NOT EXISTS(SELECT 1
                      FROM clusters_typologies
                      where cluster_id = temp.cluster_id
                        and typology_id = temp.typology_id
                        and semantic_id = temp.semantic_id));

我收到此错误，我不知道如何使它工作。

无效操作：由于内部错误，不支持这种类型的相关子查询模式；

任何人都知道如何修复或如何使用复合键避免将表插入表中的最佳方法。

谢谢。

Answer 1

要继续学习，请遵循本指南 https://docs.aws.amazon.com/redshift/latest/dg/c_best-practices-upsert.html

，请注意，在redshift中不允许某些类型的相关子查询-这是导致错误的原因看到 https://docs.aws.amazon.com/redshift/latest/dg/r_correlated_subqueries.html

Answer 2

经过一番尝试后，我想出了如何从临时表中进行插入，并从复合主键进行检查以避免重复。

基本上从@Jon Scott发送的AWS文档中，我了解到Redshift不支持在内部选择中使用外部表。

我使用左连接来解决，并检查连接列是否为空。
在我现在使用的查询下方。

INSERT INTO clusters.clusters_typologies (cluster_id, typology_id, semantic_id)
    (SELECT temp.cluster_id, temp.typology_id, temp.semantic_id
     FROM aaaa temp
            LEFT JOIN clusters.clusters_typologies clu_typ ON temp.cluster_id = clu_typ.cluster_id AND
                                                              temp.typology_id = clu_typ.typology_id AND
                                                              temp.semantic_id = clu_typ.semantic_id
     WHERE clu_typ.cluster_id IS NULL
       AND clu_typ.typology_id IS NULL
       AND clu_typ.semantic_id IS NULL);

仅当复合主键尚不存在时，如何才能批量插入行？ [AWS Redshift]

2 个答案: