我正在寻找一种方法来检索可能“或多或少”重复的同一记录的数据。
样本数据:
+----+----------+------+--------------------------+
| ID | Date | Item | Descripion |
+----+----------+------+--------------------------+
| 11 | 1/1/2018 | CPU | CPU needs replacement |
| 11 | 1/2/2018 | CPU | CPU requires replacement |
| 12 | 1/1/2018 | CPU | CPU needs replacement |
+----+----------+------+--------------------------+
前两个记录重复,而最后一个记录不重复。
逻辑
如果具有相同的ID,并且时间跨度小于或等于2天,则保留相同的项目。
输出
按ID排序的数据集,其中包含几乎重复的数据。
答案 0 :(得分:1)
首先,您不应使用Oracle保留关键字作为列名,例如DATE
,因为您必须一直将其用双引号引起来。
现在,我相信您需要类似以下内容的内容,但是如果没有预期的输出,很难说清楚。另外,您应该尝试提供更好的结果集。在这种情况下,如果您连续几天有相同的ID
,并且在某天或那几天的差异少于2天,则将获得所有行。
仅获取相差小于等于2天的记录,请使用
SELECT ID,"DATE",ITEM,DESCRIPTION
FROM
(SELECT T.*,
LEAD(TRUNC("DATE"), 1) OVER ( PARTITION BY ID ORDER BY "DATE")
-
TRUNC("DATE")
AS DIF1,
TRUNC("DATE")
-
LAG(TRUNC("DATE"), 1) OVER (PARTITION BY ID ORDER BY "DATE")
AS DIF2
FROM FOCUS_SAMPLE T
) T1
WHERE T1.DIF1 <= 2 OR T1.DIF2 <=2
要获取所有记录,以防万一甚至有一场比赛使用
SELECT *
FROM FOCUS_SAMPLE
WHERE ID IN (SELECT ID
FROM (SELECT T.*,
LEAD(TRUNC("DATE"), 1)
OVER (
PARTITION BY ID
ORDER BY "DATE") - TRUNC("DATE") AS DIF
FROM FOCUS_SAMPLE T) T1
WHERE T1.DIF <= 2)
答案 1 :(得分:0)
尝试类似的方法, 在这里,我们使用rowid删除重复的行。
create table temp as
select 11 id,sysdate mdate,'CPU' item,' CPU needs replacement' description from dual union all
select 11 id,sysdate-2 mdate,'CPU' item,' CPU requires replacement' description from dual union all
select 12 id,sysdate mdate,'CPU' item,' CPU needs replacement' description from dual ;
供选择:
select * from temp where id in (
select id from temp a where rowid not in (select max(rowid) from temp b where a.id=b.id and b.mdate between a.mdate-2 and a.mdate )
) order by id ;
要删除:
delete * from temp a where rowid not in (select max(rowid) from temp b where a.id=b.id and b.mdate between a.mdate-2 and a.mdate );
答案 2 :(得分:0)
如果您希望结果是“无重复”的,则可以使用NOT EXISTS
来筛选两天以内存在较早记录的行。
SELECT *
FROM "ELBAT" "T1"
WHERE NOT EXISTS (SELECT *
FROM "ELBAT" "T2"
WHERE "T2"."ID" = "T1"."ID"
AND "T2"."ITEM" = "T1"."ITEM"
AND "T2"."ROWID" <> "T1"."ROWID"
AND "T1"."DATE" - "T2"."DATE" >= 0
AND "T1"."DATE" - "T2"."DATE" <= 2);
如果只希望有“重复项”,则可以使用EXISTS
仅保留行,该行存在正负两天的另一条记录。
SELECT *
FROM "ELBAT" "T1"
WHERE EXISTS (SELECT *
FROM "ELBAT" "T2"
WHERE "T2"."ID" = "T1"."ID"
AND "T2"."ITEM" = "T1"."ITEM"
AND "T2"."ROWID" <> "T1"."ROWID"
AND ABS("T1"."DATE" - "T2"."DATE") <= 2);