Question

我用雪花中的主键创建了下表，并且每当我尝试向该表中插入数据时，它也允许重复记录。如何限制重复的ID？

create table tab11(id int primary key not null,grade varchar(10));

insert into tab11 values(1,'A');
insert into tab11 values(1,'B');

select * from tab11;

输出：插入重复的记录。

ID  GRADE
1   A
1   B

Answer 1

Snowflake允许您将列标识为主键，但是不会对它们强制唯一性。来自documentation here：

Snowflake支持定义和维护约束，但不强制执行约束，但始终执行的NOT NULL约束除外。

雪花中的主键仅用于提供信息。我不是来自Snowflake，但我认为在Primary Keys中强制唯一性与Snowflake在幕后存储数据的方式并不一致。

Answer 2

Snowflake 不强制执行唯一约束，因此您只能通过以下方式缓解问题：

使用 SEQUENCE 填充您希望唯一的列；
将列定义为 NOT NULL（有效执行）；
使用存储过程，您可以通过编程确保不会引入重复项；
使用存储过程（可能由计划任务运行）定期进行重复数据删除；

Answer 3

Snowflake支持ANSI SQL标准中的以下约束类型：

唯一
主键
外键
不为空

一个表可以有多个唯一键和外键，但只能有一个主键。所有外键都必须引用与外键中每一列的列类型相匹配的对应主键或唯一键。外键的主键可以与外键位于不同的表或相同的表上。

由于主键允许重复，因此您可以尝试合并。

DROP TABLE IF EXISTS tgt_tb; 

 create or replace temp table tgt_tb

 (id int, name string);



 DROP TABLE IF EXISTS src_tb; 

 create temp table src_tb

 (id int, name string);


 insert into src_tb

 select 1,'a'

 union all

 select 1,'a'

 union all

 select 2,'b'

 union all

 select 3,'c';


 insert into tgt_tb

 select 3,'c';

 DROP TABLE IF EXISTS src_stg_tb; 

 create temp table src_stg_tb

 as

select * from


(select *, ROW_NUMBER() OVER(PARTITION BY hash(*) order by hash(*) ) as rnm, hash(*) as hashkey

from src_tb 

) A

where A.rnm=1;


 merge into tgt_tb TGT 

 using src_stg_tb SRC 

on hash(tgt.id, tgt.name) = SRC.hashkey

when not matched then insert

Values (src.id,src.name)

Answer 4

您可能想看看使用merge语句来处理具有重复PK的行到达时发生的情况：

create table tab1(id int primary key not null, grade varchar(10));

insert into tab1 values(1, 'A');

-- Try merging values 1, and 'B': Nothing will be added
merge into tab1 using 
    (select * from (values (1, 'B')) x(id, grade)) tab2 
  on tab1.id = tab2.id
    when not matched then insert (id, grade)
                          values (tab2.id, tab2.grade);

select * from tab1;

-- Try merging values 2, and 'B': New row added
merge into tab1 using 
    (select * from (values (2, 'B')) x(id, grade)) tab2 
  on tab1.id = tab2.id
    when not matched then insert (id, grade)
                          values (tab2.id, tab2.grade);

select * from tab1;

-- If instead of ignoring dupes, we want to update:
merge into tab1 using 
    (select * from (values (1, 'F'), (2, 'F')) x(id, grade)) tab2 
  on tab1.id = tab2.id
    when matched then update set tab1.grade = tab2.grade
    when not matched then insert (id, grade)
                          values (tab2.id, tab2.grade);

select * from tab1;

对于更复杂的合并，您可能希望使用Snowflake流（更改数据捕获表）进行调查。除了文档之外，我还创建了一个SQL脚本演练，介绍了如何使用流来使登台表和prod表保持同步：

https://snowflake.pavlik.us/index.php/2020/01/12/snowflake-streams-made-simple

Answer 5

您可以尝试使用SEQUENCE来满足您的要求

https://docs.snowflake.net/manuals/user-guide/querying-sequences.html#using-sequences

Answer 6

Snowflake documentation says it doesnt enforce the constraint.

https://docs.snowflake.com/en/sql-reference/constraints-overview.html

我希望尝试并使用 merge ，而不是加载脚本过程失败。我还没有在雪花中使用合并语句。对于其他nosql数据库，我使用了merge语句而不是insert。

如何限制重复记录插入雪花中的表

6 个答案: