sql查询获取两个或多个类似行的组

时间:2012-01-20 14:42:26

标签: sql

我有一个包含此架构的表:

enter image description here

背景是那些在同一天和几乎同一时间旅行的人。

我需要从中得到的是:
具有相似日期(2 +/-小时最大差异)相同地点和相同类型的人群,他们必须与该约束一起出现两次或更多次。

在上图中,John和Steve应该出现在结果中,因为它们共享查询的所有必要条件。

提前致谢。

1 个答案:

答案 0 :(得分:1)

首先,如你所说,将表迁移到SQLServer 2008。 然后,此查询可以为2人组成的小组提供帮助:

select t1.pesonId as Person1, t2.personId as Person2
from
   yourTable as t1
   inner join
   yourTable as t2
     on 
       t2.PersonId > t1.PersonId and --to avoid t1,t2 and t2,t1
       t2.Place = t1.Place and
       t2.Type = t1.type and
       t2.date between dateadd( hh, -2, t1.date ) and dateadd( hh, +2, t1.date)
group by
   t1.pesonId, t2.personId
having count(*) > 1   --more than one time as you say

然后,此查询可以为3人团体提供帮助:

select t1.pesonId as Person1, t2.personId as Person2,, t3.personId as Person3
from
   yourTable as t1
   inner join
   yourTable as t2
     on 
       t2.PersonId > t1.PersonId and 
       t2.Place = t1.Place and
       t2.Type = t1.type and
       t2.date between dateadd( hh, -2, t1.date ) and dateadd( hh, +2, t1.date)
   inner join
   yourTable as t3
     on 
       t3.PersonId > t2.PersonId and 
       t3.Place = t1.Place and
       t3.Type = t1.type and
       t3.date between dateadd( hh, -2, t1.date ) and dateadd( hh, +2, t1.date)
group by
   t1.pesonId, t2.personId, t3.personId
having count(*) > 1   --more than one time as you say

我有tested first query in data taking Post as your table,结果如下:

Person1 Person2     
------- ------- --- 
22656   23354   584 
22656   29407   237 
22656   23283   230 
22656   69083   189 
22656   57695   178 
157882  203907  177 
26428   131527  175 
20862   131527  163 
22656   34397   159 
22656   65358   150 

(10 row(s) affected)

对于更精细的分析,我建议您使用SSAS或转移到knime等数据挖掘工具。