Question

我有一个包含许多列的表，我必须根据这两列进行选择：

TIME   ID
-216   AZA
 215   AZA
  56   EA
 -55   EA
  66   EA
 -03   AR
  03   OUI
-999   OP
 999   OP
  04   AR
  87   AR

预期输出

 TIME   ID
  66   EA
  03   OUI
  87   AR

我需要选择没有匹配的行。有些行具有相同的ID，几乎相同的时间但是反转有点差异。例如，TIME -216的第一行与时间215的第二个记录匹配。我试图以多种方式解决它，但每次我发现自己迷失了。

Answer 1

第一步 - 查找具有重复ID的行。第二步 - 过滤近似重复的行。

第一步：

SELECT t1.TIME, t2.TIME, t1.ID FROM mytable t1 JOIN mytable
 t2 ON t1.ID = t2.ID AND t1.TIME > t2.TIME;

join子句的第二部分确保我们只为每对获得一条记录。

第二步：

SELECT t1.TIME,t2.TIME,t1.ID FROM mytable t1 JOIN mytable t2 ON t1.ID = t2.ID AND
 t1.TIME > t2.TIME WHERE ABS(t1.TIME + t2.TIME) < 3;

如果例如，这将产生一些重复的结果。 (10, FI), (-10, FI) and (11, FI)在您的表中，因为有两个有效对。你可以按如下方式过滤掉这些：

 SELECT t1.TIME,MAX(t2.TIME),t1.ID FROM mytable t1 JOIN mytable t2 ON
 t1.ID = t2.ID AND t1.TIME > t2.TIME WHERE ABS(t1.TIME + t2.TIME) < 3 GROUP BY 
 t1.TIME,t1.ID;

但目前还不清楚你想要放弃哪种结果。希望这会指出你正确的方向！

Answer 2

这有帮助吗？

create table #RawData
(
    [Time] int,
    ID varchar(3)
)

insert into #rawdata ([time],ID)
select -216,   'AZA'
union
select 215,   'AZA' 
union
select 56,   'EA' 
union
select -55,   'EA' 
union
select 66,   'EA' 
union
select -03,   'AR' 
union
select 03,   'OUI' 
union
select -999,   'OP' 
union
select 999,   'OP' 
union
select 04,   'AR' 
union
select 87,   'AR' 
union
-- this value added to illustrate that the algorithm does not ignore this value
select 156,   'EA' 

--create a copy with an ID to help out
create table #Data
(
    uniqueId uniqueidentifier,
    [Time] int,
    ID varchar(3)
)

insert into #Data(uniqueId,[Time],ID) select newid(),[Time],ID from #RawData
declare @allowedDifference int
select @allowedDifference = 1
--find duplicates with matching inverse time
select *, d1.Time + d2.Time as pairDifference from #Data d1 inner join #Data d2 on d1.ID = d2.ID and (d1.[Time] + d2.[Time] <=@allowedDifference and d1.[Time] + d2.[Time] >= (-1 * @allowedDifference))

-- now find all ID's ignoring these pairs
select [Time],ID from #data 
where uniqueID not in (select d1.uniqueID from #Data d1 inner join #Data d2 on d1.ID = d2.ID and (d1.[Time] + d2.[Time] <=3 and d1.[Time] + d2.[Time] >= -3))

困难的SQL查询

2 个答案: