根据列删除重复的行

时间:2017-03-16 18:58:57

标签: postgresql

我有一个名为Aircraft的表,并且有很多记录。问题是有些是重复的......现在,我知道如何选择重复项及其计数:

SELECT flight_id, latitude, longitude, altitude, call_sign, measurement_time, COUNT(*)
FROM Aircraft
GROUP BY flight_id, latitude, longitude, altitude, call_sign, measurement_time
HAVING COUNT(*) > 1;

返回类似于:

的内容

enter image description here

现在,我需要做的是删除重复项,每个只留一个...这样当我再次运行查询时,所有计数应该等于1.

我知道我可以使用DELETE关键字,但我不确定如何从选择中删除。

我确信我错过了一个简单的步骤,但我不想破坏我的数据库...成为新手。

我该怎么做?

3 个答案:

答案 0 :(得分:4)

TRU_BANK$VaR <- truncate1(BANK$TVAR_AVG)

Error in `$<-.data.frame`(`*tmp*`, "VaR", value = c(19.6, 35.2, 26.9,  : 
replacement has 501 rows, data has 507

如果上面的查询返回正确的行(要删除) 您可以将其更改为删除语句:

SELECT
    flight_id, latitude, longitude, altitude, call_sign, measurement_time
FROM Aircraft a
WHERE EXISTS (
    SELECT * FROM Aircraft x
    WHERE x.flight_id = a.flight_id
    AND x.latitude = a.latitude 
    AND x.longitude = a.longitude
    AND x.altitude = a.altitude
    AND x.call_sign  = a.call_sign
    AND x.measurement_time = a.measurement_time 
    AND x.id < a.id
 )
;

答案 1 :(得分:0)

如果是一次性操作,您可以使用相同的模式创建临时表,然后像这样复制唯一的行:

insert into Aircraft_temp
select distinct on (flight_id, measurement_time) Aircraft.* from Aircraft

然后通过重命名或截断飞机将它们换掉并复制临时内容(truncate Aircraft; insert into Aircraft select * from Aircraft_temp;)。

更安全地将飞机重命名为Aircraft_old并将Aircraft_temp重命名为Aircraft,以便保留原始数据,直到您确定事情正确为止。或者至少在执行截断之前检查上面的计数查询中的行数是否与临时表中的行数相匹配。

Update2:使用单独的有效主键(假设它被称为id),您可以基于自连接执行DELETE,如下所示:

delete from Aircraft using (
    select a1.id
    from Aircraft a1
    left join (select flight_id, measurement_time, min(id) as id from Aircraft group by 1,2) a2
    on a1.id = a2.id
    where a2.id is null
) as d
where Aircraft.id=d.id

这为每个航班找到最小id(对于“最新”也可以做最大值),并确定整套中的id不是最小值的所有记录(不匹配)在加入中)。删除了不匹配的ID。

答案 2 :(得分:0)

我一直在SQL SERVER中使用CTE方法。这允许您定义要比较的列,一旦确定了哪些列构成重复,然后您可以为其分配CTE值,然后返回并清除大于1的CTE值。这是一个我做的重复检查的例子。

WITH CTE AS
(select  d.UID
    ,d.LotKey
    ,d.SerialNo
    ,d.HotWeight
    ,d.MarketValue
    ,RN = ROW_NUMBER()OVER(PARTITION BY d.HotWeight, d.serialNo, d.MarketValue order by d.SerialNo)
from LotDetail d
where d.LotKey = ('1~20161019~305')
)
DELETE FROM CTE WHERE RN <> 1

在我的例子中,我正在查看d.hotweight和d.serial no匹配的LotDetail表。如果有匹配则原始获得CTE 1,任何重复获得CTE 2或更高,具体取决于重复数量。然后使用最后一个DELETE语句清除出现重复的条目。这非常灵活,所以你应该能够适应你的问题。

以下是根据您的情况量身定制的示例。

WITH CTE AS
(select  d.Flight_ID
    ,d.Latitude
    ,d.Longitude
    ,d.Altitude
    ,d.Call_sign
            ,d.Measurement*
    ,RN = ROW_NUMBER()OVER(PARTITION BY d.Flight_ID, d.Latitude, d.Longitude, d.Altitude, d.Call_Sign, d.Measurement* order by d.SerialNo)
from Aircraft d
where d.flight_id = ('**INSERT VALUE HERE')
)
DELETE FROM CTE WHERE RN <> 1