with de_duplicate (ad_id, id_type, lat, long) AS (
select ad_id, id_type, lat, long,
Row_Number() over(partition by ad_id,id_type, lat, long) AS duplicate_count
from tempschema.temp_test)
select * from de_duplicate;
以上运行成功,但是当我尝试执行删除操作时
with de_duplicate(ad_id, id_type, lat, long) AS
(
select ad_id, id_type, lat, long,
Row_Number() over(partition by ad_id,id_type, lat, long) AS duplicate_count
from tempschema.temp_test
)
delete from de_duplicate where duplicate_count > 1;
它会抛出错误 亚马逊无效操作:语法错误在或附近"删除" 职位:190;
我在redshift集群上运行这些查询。有什么想法吗?
答案 0 :(得分:0)
考虑将CTE转换为子查询并添加 unique_id 以匹配外部查询:
DELETE FROM tempschema.temp_test
WHERE unique_id NOT IN
(SELECT sub.unique_id
FROM
(SELECT unique_id, ad_id, id_type, lat, long,
ROW_NUMBER() OVER (PARTITION BY ad_id, id_type, lat, long) AS dup_count
FROM tempschema.temp_test) sub
WHERE sub.dup_count > 1)
或者,考虑使用聚合子查询进行删除:
DELETE FROM tempschema.temp_test
WHERE unique_id NOT IN
(SELECT MIN(unique_id)
FROM tempschema.temp_test
GROUP BY ad_id, id_type, lat, long)
当然,两者都假设您在表格中有 unique_id ,但如果没有,则可以进行调整。
答案 1 :(得分:0)
我明白你要做什么,这是一个常见的问题,但这个方法有两个问题:
1)您尝试从查询结果(import math
goal = 85
state = [-1] * (goal + 1)
state[2] = 0
for k in range(2 , goal/2 + 1):
if state[k] < 0: continue
for pos, cost in [
(k*2, k),
(k*2 + 1, math.floor(k/2)),
(k*2+2, k+2)]:
if pos > goal: continue
if state[pos] == -1 or state[pos] > state[k] + cost:
state[pos] = state[k] + cost
# Possibly store k somewhere to build the solution.
print state[goal]
)中删除,而不是从源表(de_duplicate
)中删除。即使您在tempschema.temp_test
语句中识别重复项,它也与源表de_duplicate
无关。
2)CTE(tempschema.temp_test
子句)不能直接与WITH
和DELETE
一起使用,它们需要连接子查询。
您的案例中有两种可能的方法:
1)如果您的表中有唯一ID和重复条件,则使用已连接的子查询(下面的测试用例中为UPDATE
,因此id = 3且id = 4是重复的):
val
2)创建一个已清理的临时表并交换表:
create table test1 (id integer, val integer);
insert into test1 values (1,1),(2,2),(3,3),(4,3);
delete from test1 using (
select *
from (
select *, row_number() over (partition by val order by id desc)
from test1
)
where row_number>1
) s
where test1.id=s.id;