我有一张描述transactions
的表格,如下所示(仅显示相关列)
ID Date Type
1 2017/01/30 1
2 2017/02/01 1
3 2017/02/02 1
4 2017/02/02 1
5 2017/02/01 2
6 2017/02/02 2
7 2017/02/25 3
8 2017/02/26 3
9 2017/02/24 3
10 2017/02/28 3
我尝试检查记录的日期,以选择在x
中指定的天数内显示days
次Type
次并且具有相同x
的行。因此,举例说明days
= 2,ID Date Type
1 2017/01/30 1
2 2017/02/01 1
3 2017/02/02 1
4 2017/02/02 1
7 2017/02/25 3
8 2017/02/26 3
9 2017/02/24 3
10 2017/02/28 3
--Basically, it's displaying records that have similar (close) dates to other records.
= 2(基本上为3天:当天+ 2天),结果应为:
x
如果days
= 1,ID Date Type
3 2017/02/02 1
4 2017/02/02 1
= 0(同一天)
ID
第一个例子实际上很好地概括了我的问题,因为记录(Group by Type,[date range] having count(*)>x
)1,2和3彼此相差3天,记录2,3,4在3天之内彼此,但1和4不是。这可能会导致2个范围(这意味着2个范围中的任何一个可能会排除行)。
我正在考虑使用[date range]
,但这导致了如何指定 gridOptions.removeSelectedAfterFilterd = true
的问题,以便在第一个示例的情况下不会留下任何记录。
是否可以指定重叠组并将单个记录分组到多个组中?
有没有更好的方法解决这个问题?
答案 0 :(得分:2)
我不确定这是最好的方法,但它完成了工作。简而言之,它会构建一个日期表,其中@Days
天的每个可能范围都可以返回,返回每个范围内的所有transactions
。换行select
然后只返回单个transaction
值以匹配您问题中的输出。
declare @t table(ID int, DateValue date, TypeID int); -- Avoid reserved words as object names
insert into @t values(1 ,'2017/01/30',1),(2 ,'2017/02/01',1),(3 ,'2017/02/02',1),(4 ,'2017/02/02',1),(5 ,'2017/02/01',2),(6 ,'2017/02/02',2),(7 ,'2017/02/03',2),(8 ,'2017/02/25',3),(9 ,'2017/02/26',3),(10,'2017/02/24',3),(11,'2017/02/28',3);
-- Declare the working parameters.
declare @Days int = 2;
declare @x int = 2;
-- Build a numbers table, then use it to build a dates table.
with n(n) as (with n(n) as (select 1 union all select 1 union all select 1 union all select 1 union all select 1 union all select 1 union all select 1 union all select 1 union all select 1 union all select 1))
,d(d) as (select top (select datediff(d,min(DateValue),max(DateValue)) + 1 from @t) dateadd(d,row_number() over (order by (select null))-1,(select min(DateValue) from @t)) from n n1,n n2,n n3,n n4,n n5,n t6)
select distinct ID
,DateValue
,TypeID
from( -- Use the dates table to work out how many records fall into every possible range of dates.
select d.d as RangeStart
,dateadd(d,@Days,d.d) as RangeEnd
,t.ID
,t.DateValue
,t.TypeID
,count(t.ID) over (partition by d.d, t.TypeID) as TransactionCount
from d
inner join @t t
on(t.DateValue between d.d and dateadd(d,@Days,d))
) a
where TransactionCount >= @x -- Then return just the transactions.
order by ID
,TypeID
,DateValue;
输出:
+----+------------+--------+
| ID | DateValue | TypeID |
+----+------------+--------+
| 1 | 2017-01-30 | 1 |
| 2 | 2017-02-01 | 1 |
| 3 | 2017-02-02 | 1 |
| 4 | 2017-02-02 | 1 |
| 5 | 2017-02-01 | 2 |
| 6 | 2017-02-02 | 2 |
| 7 | 2017-02-03 | 2 |
| 8 | 2017-02-25 | 3 |
| 9 | 2017-02-26 | 3 |
| 10 | 2017-02-24 | 3 |
| 11 | 2017-02-28 | 3 |
+----+------------+--------+
答案 1 :(得分:0)
非常感谢iamdave获得所有帮助和持续反馈。他的回答完美地涵盖了所有内容,并且评论提供了对此问题相关主题的一些见解。话虽如此,我决定发布我使用的最终代码(在答案结束时)以及与他们合作后得出的一些结论。
以下代码是我使用的第一个代码,比<{3}}提供的脚本 慢。
SELECT * from transactions c
INNER JOIN( SELECT Type,StartDate,EndDate FROM GenerateDateRange('2016-01-01 00:00' , '2016-01-05 05:00' , 4,'d') a //all this function does is create a table with date ranges
LEFT JOIN transactions b
ON b.Date>=StartDate AND b.Date<=EndDate
GROUP BY StartDate,EndDate,Type HAVING COUNT(*)>2 ) d
ON d.Type = c.Type AND c.Date>=d.StartDate AND c.Date<=d.EndDate
使用上述方法使用计数表生成日期的变体,(与上面的示例相同)使用按类型和日期范围分组的日期表和事务连接(在匹配日期)作为广泛过滤器:
--Variables are pretty much filters here, so I don't include them
with n(n) as (select 1 union all select 1 union all select 1 union all select 1 union all select 1 union all select 1 union all select 1 union all select 1 union all select 1 union all select 1)
,d(d) as (select top (select datediff(d,@mindate,@maxdate) + 1) dateadd(d,row_number() over (order by (select null))-1,(select @mindate)) from n n1, n n2, n n3, n n4, n n5, n n6)
select distinct b.*
from(
select d.d as RangeStart,
dateadd(d,@Days,d.d) as RangeEnd,
t.Type
from d
inner join transactions t
on t.Date > d.d AND t.Date <= dateadd(d,@Days,d)
group by d.d ,dateadd(d,@Days,d),Type having COUNT(*)>@x
) a inner join transactions b on a.Type = b.Type AND b.Date>=a.RangeStart AND b.Date<=a.RangeEnd
最佳是iamdave的精确副本,唯一明显的区别是需要额外加入以避免必须两次选择字段(这不会显着影响性能)< / p>
with n(n) as (select 1 union all select 1 union all select 1 union all select 1 union all select 1 union all select 1 union all select 1 union all select 1 union all select 1 union all select 1)
,d(d) as (select top (select datediff(d,@mindate,@maxdate) + 1) dateadd(d,row_number() over (order by (select null))-1,(select @mindate)) from n n1, n n2, n n3, n n4, n n5, n n6)
select distinct b.*
from(select --d.d as RangeStart, dateadd(d,@rangeDays,d)) as RangeEnd --commented, because not used
t.id,
count(t.id) over(partition by d.d, t.Type) as TransactionCount
from d
inner join transaction t
on(t.Date > d.d and t.Date <= dateadd(d,@rangeDays,d))
) a left join transaction b on a.id = b.id
where TransactionCount > @x --and b.Type is not null
3个脚本之间的一些比较
我使用相同参数对这些查询进行了一些测试。我正在使用的表包含超过200,000多行,我过滤日期大约200,000并运行查询几次以确保它不是意外滞后。
参数:Days=1; x=1
(每天超过1个相同类型的交易)
Script avg execution time (seconds)
1st 28s --with this performance, further comparison is not needed.
2nd 7s
3rd 4s
很少有事情需要注意:
null
中不显示Type
(由于加入类型),这使第三个更灵活