TSQL - 初始查询的"重复" /误报的运行日期比较?

时间:2016-02-08 21:01:16

标签: sql sql-server tsql datetime subquery

我对SQL很陌生,正在努力从几个非常大的表中提取一些数据进行分析。数据基本上是系统上资产的触发事件。这些事件都有我关心的created_date(datetime)字段。

我能够将下面的查询放在一起以获取我需要的数据(YAY):

SELECT 
         event.efkey
        ,event.e_id
        ,event.e_key
        ,l.l_name
        ,event.created_date
        ,asset.a_id
        ,asset.asset_name
  FROM event
  LEFT JOIN asset
         ON event.a_key = asset.a_key
  LEFT JOIN l
         ON event.l_key = l.l_key


  WHERE event.e_key IN (350, 352, 378)

  ORDER BY asset.a_id, event.created_date

然而,虽然这给了我想要的特定事件的数据,但我还有另一个问题。资产可以反复触发这些事件,这可能导致大量误报"我正在看什么。

我需要做的是浏览上面查询的结果集,并删除一起发生在N分钟以内的资产的任何事件(例如本例中为30分钟)。因此,如果asset_ID是相同的,并且event.created_date在该集合中该资产的另一个事件的30分钟内,那么我希望将其删除。例如:

以下记录

a_id 1124 created 2016-02-01 12:30:30
a_id 1124 created 2016-02-01 12:35:31
a_id 1124 created 2016-02-01 12:40:33
a_id 1124 created 2016-02-01 12:45:42
a_id 1124 created 2016-02-02 12:30:30
a_id 1124 created 2016-02-02 13:00:30
a_id 1115 created 2016-02-01-12:30:30

我只想回复:

a_id 1124 created 2016-02-01 12:30:30 
a_id 1124 created 2016-02-02 12:30:30 
a_id 1124 created 2016-02-02 13:00:30 
a_id 1115 created 2016-02-01-12:30:30

我尝试引用thisthis,但我不能让那些概念适用于我。我知道我可能需要做一个SELECT * FROM(我现有的查询),但我似乎无法做到这一点而不会结束大量的"多部分标识符无法绑定&#34 ;错误(我没有创建临时表的经验,到目前为止我的尝试失败了)。我也不确定如何使用DATEDIFF作为日期过滤功能。

任何帮助将不胜感激!如果你可以为新手(或解释链接)愚蠢,这也会有所帮助!

2 个答案:

答案 0 :(得分:2)

这比最初出现的问题更棘手。困难的部分是捕获前一个好行并删除下一个坏行但不允许那些坏行影响下一行是否良好。这就是我想出的。我试图用代码中的注释来解释发生了什么。

--sample data since I don't have your table structure and your original query won't work for me
declare @events table
(
  id int,
  timestamp datetime
)

--note that I changed some of your sample data to test some different scenarios
insert into @events values( 1124, '2016-02-01 12:30:30')
insert into @events values( 1124, '2016-02-01 12:35:31')
insert into @events values( 1124, '2016-02-01 12:40:33')
insert into @events values( 1124, '2016-02-01 13:05:42')
insert into @events values( 1124, '2016-02-02 12:30:30')
insert into @events values( 1124, '2016-02-02 13:00:30')
insert into @events values( 1115, '2016-02-01 12:30:30')

--using a cte here to split the result set of your query into groups
--by id (you would want to partition by whatever criteria you use
--to determine that rows are talking about the same event)
--the row_number function gets the row number for each row within that 
--id partition
--the over clause specifies how to break up the result set into groups 
--(partitions) and what order to put the rows in within that group so 
--that the numbering stays consistant
;with orderedEvents as
(
    select id, timestamp, row_number() over (partition by id order by timestamp) as rn
    from @events
    --you would replace @events here with your query
)
--using a second recursive cte here to determine which rows are "good"
--and which ones are not.  
, previousGoodTimestamps as 
(
    --this is the "seeding" part of the recursive cte where I pick the
    --first rows of each group as being a desired result.  Since they 
    --are the first in each group, I know they are good.  I also assign
    --their timestamp as the previous good timestamp since I know that 
    --this row is good.
    select id, timestamp, rn, timestamp as prev_good_timestamp, 1 as is_good
    from orderedEvents
    where rn = 1

    union all

    --this is the recursive part of the cte.  It takes the rows we have
    --already added to this result set and joins those to the "next" rows
    --(as defined by our ordering in the first cte).  Then we output
    --those rows and do some calculations to determine if this row is 
    --"good" or not.  If it is "good" we set it's timestamp as the
    --previous good row timestamp so that rows that come after this one 
    --can use it to determine if they are good or not.  If a row is "bad"
    --we just forward along the last known good timestamp to the next row.
    --
    --We also determine if a row is good by checking if the last good row
    --timestamp plus 30 minutes is less than or equal to the current row's
    --timestamp.  If it is then the row is good.
    select e2.id
        , e2.timestamp
        , e2.rn
        , last_good_timestamp.timestamp
        , case
            when dateadd(mi, 30, last_good_timestamp.timestamp) <= e2.timestamp then 1
            else 0
          end
    from previousGoodTimestamps e1
    inner join orderedEvents e2 on e2.id = e1.id and e2.rn = e1.rn + 1
    --I used a cross apply here to calculate the last good row timestamp
    --once.  I could have used two identical subqueries above in the select
    --and case statements, but I would rather not duplicate the code.
    cross apply
    (
        select case 
                 when e1.is_good = 1 then e1.timestamp --if the last row is good, just use it's timestamp
                 else e1.prev_good_timestamp --the last row was bad, forward on what it had for the last good timestamp
               end as timestamp
    ) last_good_timestamp
)
select *
from previousGoodTimestamps
where is_good = 1 --only take the "good" rows

链接到MSDN以获取一些更复杂的事情:

答案 1 :(得分:0)

-- Sample data.
declare @Samples as Table ( Id Int Identity, A_Id Int, CreatedDate DateTime );
insert into @Samples ( A_Id, CreatedDate ) values
  ( 1124, '2016-02-01 12:30:30' ),
  ( 1124, '2016-02-01 12:35:31' ),
  ( 1124, '2016-02-01 12:40:33' ),
  ( 1124, '2016-02-01 12:45:42' ),
  ( 1124, '2016-02-02 12:30:30' ),
  ( 1124, '2016-02-02 13:00:30' ),
  ( 1125, '2016-02-01 12:30:30' );
select * from @Samples;

-- Calculate the windows of 30 minutes before and after each   CreatedDate   and check for conflicts with other rows.
with Ranges as (
  select Id, A_Id, CreatedDate,
    DateAdd( minute, -30, S.CreatedDate ) as RangeStart, DateAdd( minute, 30, S.CreatedDate ) as RangeEnd
    from @Samples as S )
  select Id, A_Id, CreatedDate, RangeStart, RangeEnd,
    -- Check for a conflict with another row with:
    --   the same   A_Id   value and an earlier   CreatedDate   that falls inside the +/-30 minute range.
    case when exists ( select 42 from @Samples where A_Id = R.A_Id and CreatedDate < R.CreatedDate and R.RangeStart < CreatedDate and CreatedDate < R.RangeEnd ) then 1
      else 0 end as Conflict
    from Ranges as R;