我有一个包含此架构的表:
背景是那些在同一天和几乎同一时间旅行的人。
我需要从中得到的是:
具有相似日期(2 +/-小时最大差异)相同地点和相同类型的人群,他们必须与该约束一起出现两次或更多次。
在上图中,John和Steve应该出现在结果中,因为它们共享查询的所有必要条件。
提前致谢。
答案 0 :(得分:1)
首先,如你所说,将表迁移到SQLServer 2008。 然后,此查询可以为2人组成的小组提供帮助:
select t1.pesonId as Person1, t2.personId as Person2
from
yourTable as t1
inner join
yourTable as t2
on
t2.PersonId > t1.PersonId and --to avoid t1,t2 and t2,t1
t2.Place = t1.Place and
t2.Type = t1.type and
t2.date between dateadd( hh, -2, t1.date ) and dateadd( hh, +2, t1.date)
group by
t1.pesonId, t2.personId
having count(*) > 1 --more than one time as you say
然后,此查询可以为3人团体提供帮助:
select t1.pesonId as Person1, t2.personId as Person2,, t3.personId as Person3
from
yourTable as t1
inner join
yourTable as t2
on
t2.PersonId > t1.PersonId and
t2.Place = t1.Place and
t2.Type = t1.type and
t2.date between dateadd( hh, -2, t1.date ) and dateadd( hh, +2, t1.date)
inner join
yourTable as t3
on
t3.PersonId > t2.PersonId and
t3.Place = t1.Place and
t3.Type = t1.type and
t3.date between dateadd( hh, -2, t1.date ) and dateadd( hh, +2, t1.date)
group by
t1.pesonId, t2.personId, t3.personId
having count(*) > 1 --more than one time as you say
我有tested first query in data taking Post as your table,结果如下:
Person1 Person2
------- ------- ---
22656 23354 584
22656 29407 237
22656 23283 230
22656 69083 189
22656 57695 178
157882 203907 177
26428 131527 175
20862 131527 163
22656 34397 159
22656 65358 150
(10 row(s) affected)
对于更精细的分析,我建议您使用SSAS或转移到knime等数据挖掘工具。