Question

with de_duplicate (ad_id, id_type, lat, long) AS (
select ad_id, id_type, lat, long,
Row_Number() over(partition by ad_id,id_type, lat, long) AS duplicate_count
from tempschema.temp_test)
select * from de_duplicate;

以上运行成功，但是当我尝试执行删除操作时

with de_duplicate(ad_id, id_type, lat, long) AS 
(
select ad_id, id_type, lat, long,
Row_Number() over(partition by ad_id,id_type, lat, long) AS duplicate_count
from tempschema.temp_test
)
delete from de_duplicate where duplicate_count > 1;

它会抛出错误亚马逊无效操作：语法错误在或附近＆＃34;删除＆＃34; 职位：190;

我在redshift集群上运行这些查询。有什么想法吗？

Answer 1

考虑将CTE转换为子查询并添加 unique_id 以匹配外部查询：

DELETE FROM tempschema.temp_test
WHERE unique_id NOT IN
  (SELECT sub.unique_id
   FROM 
      (SELECT unique_id, ad_id, id_type, lat, long,
              ROW_NUMBER() OVER (PARTITION BY ad_id, id_type, lat, long) AS dup_count
        FROM tempschema.temp_test) sub
   WHERE sub.dup_count > 1)

或者，考虑使用聚合子查询进行删除：

DELETE FROM tempschema.temp_test
WHERE unique_id NOT IN
   (SELECT MIN(unique_id)
    FROM tempschema.temp_test
    GROUP BY ad_id, id_type, lat, long)

当然，两者都假设您在表格中有 unique_id ，但如果没有，则可以进行调整。

Answer 2

我明白你要做什么，这是一个常见的问题，但这个方法有两个问题：

1）您尝试从查询结果（import math goal = 85 state = [-1] * (goal + 1) state[2] = 0 for k in range(2 , goal/2 + 1): if state[k] < 0: continue for pos, cost in [ (k*2, k), (k*2 + 1, math.floor(k/2)), (k*2+2, k+2)]: if pos > goal: continue if state[pos] == -1 or state[pos] > state[k] + cost: state[pos] = state[k] + cost # Possibly store k somewhere to build the solution. print state[goal]）中删除，而不是从源表（de_duplicate）中删除。即使您在tempschema.temp_test语句中识别重复项，它也与源表de_duplicate无关。

2）CTE（tempschema.temp_test子句）不能直接与WITH和DELETE一起使用，它们需要连接子查询。

您的案例中有两种可能的方法：

1）如果您的表中有唯一ID和重复条件，则使用已连接的子查询（下面的测试用例中为UPDATE，因此id = 3且id = 4是重复的）：

val

2）创建一个已清理的临时表并交换表：

create table test1 (id integer, val integer);
insert into test1 values (1,1),(2,2),(3,3),(4,3);

delete from test1 using (
    select *
    from (
        select *, row_number() over (partition by val order by id desc)
        from test1
    )
    where row_number>1
) s
where test1.id=s.id;

WITH Clause SQL在删除行时抛出错误，但对于select statment工作正常

2 个答案: