Vertica允许将重复项插入表中。我可以使用'analyze_constraints'函数查看它们。 如何从Vertica表中删除重复的行?
答案 0 :(得分:5)
您应该尝试使用具有大量记录的DELETE来避免/限制。以下方法应该更有效:
第1步创建一个新表,其结构/投影与包含重复项的结构/投影相同:
create table mytable_new like mytable including projections ;
第2步在这个新表中插入重复数据删除的行:
insert /* +direct */ into mytable_new select <column list> from (
select * , row_number() over ( partition by <pk column list> ) as rownum from <table-name>
) a where a.rownum = 1 ;
第3步重命名原始表(包含重复的表):
alter table mytable rename to mytable_orig ;
第4步重命名新表:
alter table mytable_new rename to mytable ;
这就是全部。
答案 1 :(得分:2)
离开我的头顶,而不是一个好的答案所以让我们把它作为最后一个词,你可以删除它们并插入一个。
答案 2 :(得分:2)
Mauro的答案是正确的,但是第2步的sql中有一个错误。因此,避免DELETE的完整工作方式应该如下:
第1步创建一个新表,其结构/投影与包含重复项的结构/投影相同:
create table mytable_new like mytable including projections ;
第2步在这个新表中插入重复数据删除的行:
insert /* +direct */ into mytable_new select <column list> from (
select * , row_number() over ( partition by <pk column list> ) as rownum from mytable
) a where a.rownum = 1 ;
第3步重命名原始表(包含重复的表):
alter table mytable rename to mytable_orig ;
第4步重命名新表:
alter table mytable_new rename to mytable ;
答案 3 :(得分:1)
您可以通过创建临时表并生成伪row_id来删除Vertica表的重复项。这里有几个步骤,特别是如果要从非常大和宽的表中删除重复项。在下面的例子中,我假设,k1和k2行有多于1个重复。有关详细信息see here。
-- Find the duplicates
select keys, count(1) from large-table-1
where [where-conditions]
group by 1
having count(1) > 1
order by count(1) desc ;
-- Step 2: Dump the duplicates into temp table
create table test.large-table-1-dups
like large-table-1;
alter table test.large-table-1-dups -- add row_num column (pseudo row_id)
add column row_num int;
insert into test.large-table-1-dups
select *, ROW_NUMBER() OVER(PARTITION BY key)
from large-table-1
where key in ('k1', 'k2'); -- where, say, k1 has n and k2 has m exact dups
-- Step 3: Remove duplicates from the temp table
delete from test.large-table-1-dups
where row_num > 1;
select * from test.dim_line_items_dups;
-- Sanity test. Should have 1 row each of k1 & k2 rows above
-- Step 4: Delete all duplicates from main table...
delete from large-table-1
where key in ('k1', 'k2');
-- Step 5: Insert data back into main table from temp dedupe data
alter table test.large-table-1-dups
drop column row_num;
insert into large-table-1
select * from test.large-table-1-dups;
答案 4 :(得分:-2)
你应该看一下PostgreSQL wiki的答案,这个答案也适用于Vertica:
DELETE
FROM
tablename
WHERE
id IN(
SELECT
id
FROM
(
SELECT
id,
ROW_NUMBER() OVER(
partition BY column1,
column2,
column3
ORDER BY
id
) AS rnum
FROM
tablename
) t
WHERE
t.rnum > 1
);
删除所有重复的条目,但删除id最低的条目。