使用基于时间间隔的分组进​​行数据清理 - sql 2005

时间:2011-01-12 12:33:32

标签: sql-server-2005 grouping aggregate with-statement

我在表格中有以下数据,我想报告而不必删除任何行。

ActiveSearchID --- SearchDate --------------------- SearchPhrase
1 --------------------- 2010-12-15 12:01:11.587 --- argos
2 --------------------- 2010-12-15 12:03:40.193 ---无印良品 3 --------------------- 2010-12-15 12:03:42.370 ---无印良品 4 --------------------- 2010-12-15 12:04:29.167 ---办公用品
5 --------------------- 2010-12-15 12:05:11.590 ---熔岩
9 --------------------- 2010-12-15 12:08:38.920 --- sony vaio
10 ------------------- 2010-12-15 12:08:41.170 --- sony vaio
12 ------------------- 2010-12-15 12:09:09.920 --- sony vaio电池
13 ------------------- 2010-12-15 12:09:17.487 --- sony vaio battery
14 ------------------- 2010-12-15 12:17:10.980 --- sony vaio battery
15 ------------------- 2010-12-15 12:17:12.170 --- argos

我想要的报告是选择在5分钟间隔内搜索过的第一个搜索短语实例。 因此,例如查询没有上述信息将导致以下结果:
SearchDate ---------------- SearchPhrase
2010-12-15 12:01:11.587 --- argos
2010-12-15 12:03:40.193 ---无印良品 2010-12-15 12:04:29.167 ---办公用品
2010-12-15 12:05:11.590 ---熔岩
2010-12-15 12:08:38.920 --- sony vaio
2010-12-15 12:09:09.920 --- sony vaio电池
2010-12-15 12:17:12.170 --- argos


我尝试了以下查询,但我仍然得到重复:

选择t1.searchdate,t1.searchphrase 来自activesearches t1 内联接activesesearch t2 on t1.searchphrase = t2.searchphrase     和t1.searchdate< t2.searchdate 其中datediff(s,t1.searchdate,t2.searchdate)< = 300 按searchdate排序


我想使用“WITH SearchPhrases AS()”类型的查询,但我无法理解它。

由于

1 个答案:

答案 0 :(得分:0)

我相信鉴于您的测试数据“sony vaio battery”应该已经退回两次了。我想出了两个选择。

-- Populate test data
if(OBJECT_ID('tempdb..#Search') IS NOT NULL)
    DROP TABLE #Search
create table #Search (
    ActiveSearchID int primary key, 
    SearchDate datetime not null, 
    SearchPhrase nvarchar(30))

insert into #Search(ActiveSearchID, SearchDate, SearchPhrase)
select 1, '2010-12-15 12:01:11.587', 'argos'
union all select 2, '2010-12-15 12:03:40.193', 'muji'
union all select 3, '2010-12-15 12:03:42.370', 'muji'
union all select 4, '2010-12-15 12:04:29.167', 'Office supplies'
union all select 5, '2010-12-15 12:05:11.590', 'lava'
union all select 9, '2010-12-15 12:08:38.920', 'sony vaio'
union all select 10, '2010-12-15 12:08:41.170', 'sony vaio'
union all select 12, '2010-12-15 12:09:09.920', 'sony vaio battery'
union all select 13, '2010-12-15 12:09:17.487', 'sony vaio battery'
union all select 14, '2010-12-15 12:17:10.980', 'sony vaio battery'
union all select 15, '2010-12-15 12:17:12.170', 'argos'

我认为您正在寻找类似此查询的内容。我不知道这会如何表现:

select * 
from #Search as S
where not exists(
select * from #Search as N
where N.SearchPhrase= S.SearchPhrase
and N.SearchDate between 
    dateadd(minute, -5, S.SearchDate) AND S.SearchDate
and N.ActiveSearchID <> S.ActiveSearchID)

或者,如果您可以在时钟上使用谨慎的5分钟间隔,这可能会表现得更好 - 我没有使用大量数据进行测试:

select
    ActiveSearchID, SearchDate, SearchPhrase
from
(
    select 
        *,
        ROW_NUMBER() over (
                partition by SearchPhrase,  
                             DATEDIFF(minute, '2000-01-01', SearchDate) / 5
            order by SearchDate, ActiveSearchID) as rn,
        DATEDIFF(minute, '2000-01-01', SearchDate) as five_minute_window 
    from #Search
) as X
where
    rn = 1
order by
    ActiveSearchID