选择值过于频繁的行(基于行中指定的日期)

时间:2017-03-20 09:30:20

标签: sql-server select sql-server-2005 count group-by

我有一张描述transactions的表格,如下所示(仅显示相关列)

ID    Date          Type     
1     2017/01/30     1      
2     2017/02/01     1
3     2017/02/02     1
4     2017/02/02     1
5     2017/02/01     2
6     2017/02/02     2
7     2017/02/25     3
8     2017/02/26     3
9     2017/02/24     3
10    2017/02/28     3

我尝试检查记录的日期,以选择在x中指定的天数内显示daysType次并且具有相同x的行。因此,举例说明days = 2,ID Date Type 1 2017/01/30 1 2 2017/02/01 1 3 2017/02/02 1 4 2017/02/02 1 7 2017/02/25 3 8 2017/02/26 3 9 2017/02/24 3 10 2017/02/28 3 --Basically, it's displaying records that have similar (close) dates to other records. = 2(基本上为3天:当天+ 2天),结果应为:

x

如果days = 1,ID Date Type 3 2017/02/02 1 4 2017/02/02 1 = 0(同一天)

ID

第一个例子实际上很好地概括了我的问题,因为记录(Group by Type,[date range] having count(*)>x)1,2和3彼此相差3天,记录2,3,4在3天之内彼此,但1和4不是。这可能会导致2个范围(这意味着2个范围中的任何一个可能会排除行)。

我正在考虑使用[date range],但这导致了如何指定 gridOptions.removeSelectedAfterFilterd = true 的问题,以便在第一个示例的情况下不会留下任何记录。

是否可以指定重叠组并将单个记录分组到多个组中?

有没有更好的方法解决这个问题?

2 个答案:

答案 0 :(得分:2)

我不确定这是最好的方法,但它完成了工作。简而言之,它会构建一个日期表,其中@Days天的每个可能范围都可以返回,返回每个范围内的所有transactions。换行select然后只返回单个transaction值以匹配您问题中的输出。

declare @t table(ID int, DateValue date, TypeID int); -- Avoid reserved words as object names
insert into @t values(1 ,'2017/01/30',1),(2 ,'2017/02/01',1),(3 ,'2017/02/02',1),(4 ,'2017/02/02',1),(5 ,'2017/02/01',2),(6 ,'2017/02/02',2),(7 ,'2017/02/03',2),(8 ,'2017/02/25',3),(9 ,'2017/02/26',3),(10,'2017/02/24',3),(11,'2017/02/28',3);

-- Declare the working parameters.
declare @Days int = 2;
declare @x int = 2;

-- Build a numbers table, then use it to build a dates table.
with n(n) as (with n(n) as (select 1 union all select 1 union all select 1 union all select 1 union all select 1 union all select 1 union all select 1 union all select 1 union all select 1 union all select 1))
    ,d(d) as (select top (select datediff(d,min(DateValue),max(DateValue)) + 1 from @t) dateadd(d,row_number() over (order by (select null))-1,(select min(DateValue) from @t)) from n n1,n n2,n n3,n n4,n n5,n t6)
select distinct ID
                ,DateValue
                ,TypeID
from(   -- Use the dates table to work out how many records fall into every possible range of dates.
    select d.d as RangeStart
            ,dateadd(d,@Days,d.d) as RangeEnd
            ,t.ID
            ,t.DateValue
            ,t.TypeID
            ,count(t.ID) over (partition by d.d, t.TypeID) as TransactionCount
    from d
        inner join @t t
            on(t.DateValue between d.d and dateadd(d,@Days,d))
) a
where TransactionCount >= @x    -- Then return just the transactions.
order by ID
        ,TypeID
        ,DateValue;

输出:

+----+------------+--------+
| ID | DateValue  | TypeID |
+----+------------+--------+
|  1 | 2017-01-30 |      1 |
|  2 | 2017-02-01 |      1 |
|  3 | 2017-02-02 |      1 |
|  4 | 2017-02-02 |      1 |
|  5 | 2017-02-01 |      2 |
|  6 | 2017-02-02 |      2 |
|  7 | 2017-02-03 |      2 |
|  8 | 2017-02-25 |      3 |
|  9 | 2017-02-26 |      3 |
| 10 | 2017-02-24 |      3 |
| 11 | 2017-02-28 |      3 |
+----+------------+--------+

答案 1 :(得分:0)

非常感谢iamdave获得所有帮助和持续反馈。他的回答完美地涵盖了所有内容,并且评论提供了对此问题相关主题的一些见解。话虽如此,我决定发布我使用的最终代码(在答案结束时)以及与他们合作后得出的一些结论。

以下代码是我使用的第一个代码,比<{3}}提供的脚本 慢。

SELECT *  from transactions c 
    INNER JOIN( SELECT Type,StartDate,EndDate FROM GenerateDateRange('2016-01-01 00:00' , '2016-01-05 05:00' , 4,'d') a  //all this function does is create a table with date ranges
        LEFT JOIN transactions b 
        ON b.Date>=StartDate AND b.Date<=EndDate
        GROUP BY StartDate,EndDate,Type HAVING COUNT(*)>2 ) d
    ON d.Type = c.Type AND c.Date>=d.StartDate AND c.Date<=d.EndDate 

使用上述方法使用计数表生成日期的变体,(与上面的示例相同)使用按类型和日期范围分组的日期表和事务连接(在匹配日期)作为广泛过滤器:

--Variables are pretty much filters here, so I don't include them
with n(n) as (select 1 union all select 1 union all select 1 union all select 1 union all select 1 union all select 1 union all select 1 union all select 1 union all select 1 union all select 1)
    ,d(d) as (select top (select datediff(d,@mindate,@maxdate) + 1) dateadd(d,row_number() over (order by (select null))-1,(select @mindate)) from n n1, n n2, n n3, n n4, n n5, n n6)
select distinct b.*
    from(
        select d.d as RangeStart,
            dateadd(d,@Days,d.d) as RangeEnd,
            t.Type
        from d
            inner join transactions t
                on t.Date > d.d AND t.Date <= dateadd(d,@Days,d) 
                group by d.d ,dateadd(d,@Days,d),Type having COUNT(*)>@x
) a inner join transactions b on a.Type = b.Type AND b.Date>=a.RangeStart AND b.Date<=a.RangeEnd

最佳iamdave的精确副本,唯一明显的区别是需要额外加入以避免必须两次选择字段(这不会显着影响性能)< / p>

with n(n) as (select 1 union all select 1 union all select 1 union all select 1 union all select 1 union all select 1 union all select 1 union all select 1 union all select 1 union all select 1)
    ,d(d) as (select top (select datediff(d,@mindate,@maxdate) + 1) dateadd(d,row_number() over (order by (select null))-1,(select @mindate)) from n n1, n n2, n n3, n n4, n n5, n n6)
select distinct b.*
    from(select --d.d as RangeStart, dateadd(d,@rangeDays,d)) as RangeEnd --commented, because not used
            t.id,
            count(t.id) over(partition by d.d, t.Type) as TransactionCount
        from d
            inner join transaction t
                on(t.Date > d.d and t.Date <= dateadd(d,@rangeDays,d)) 
) a left join transaction b on a.id = b.id
where TransactionCount > @x --and b.Type is not null

3个脚本之间的一些比较

我使用相同参数对这些查询进行了一些测试。我正在使用的表包含超过200,000多行,我过滤日期大约200,000并运行查询几次以确保它不是意外滞后。

参数:Days=1; x=1(每天超过1个相同类型的交易)

 Script   avg execution time (seconds)
 1st      28s                    --with this performance, further comparison is not needed.
 2nd      7s
 3rd      4s

很少有事情需要注意:

  • 执行时间与不同参数集保持成比例。
  • 第二个查询在null显示Type(由于加入类型),这使第三个更灵活