从SQL分析与时间相关的事件日志

时间:2009-08-17 21:40:15

标签: sql sql-server analysis

我在SQL Server数据库中有一个事件日志。本质上,它记录了何时进行呼叫以及何时呼叫在呼叫中心结束(作为两个不同的记录),以及其他一些细节。我试图了解在任何给定时间使用这些数据的电话线数量。我想不出有任何好的方法让SQL查询为我确定这个,虽然这是理想的(如果它没有牺牲很快的速度)。

我的第一个想法是让程序查询每个呼叫的开始和结束事件,确定呼叫的持续时间。然后,我可以逐步完成每个单元的时间,记录在任何给定时间正在进行的呼叫数量。有没有办法在SQL中执行此操作,而不是在C#中使用线性方法或类似的东西?

编辑:呼叫有唯一的ID。会话ID,如果你愿意的话。此外,开始和结束事件是两个不同的记录 - 不是单个记录。这让我觉得有点复杂。此外,该表中还有超过1500万条记录。

Id  EvId             CallId                           DateTime       
--  ---- ------------------------------------    --------------------
 1  0   df1cbc93-5cf3-402a-940b-4441f6a7ec5c     7/9/2008 8:12:56 PM
 2  1   df1cbc93-5cf3-402a-940b-4441f6a7ec5c     7/9/2008 8:13:07 PM
 3  0   ec1c2078-1765-4377-9126-6f26fe33e4a9    7/10/2008 4:33:10 PM
 4  10  ec1c2078-1765-4377-9126-6f26fe33e4a9    7/10/2008 4:33:13 PM
 5  1   ec1c2078-1765-4377-9126-6f26fe33e4a9    7/10/2008 4:33:13 PM
 6  0   a3c3b9a0-a23b-4dda-b4e4-e82f0209c94d    7/10/2008 4:33:13 PM
 7  10  a3c3b9a0-a23b-4dda-b4e4-e82f0209c94d    7/10/2008 4:33:15 PM
 8  1   a3c3b9a0-a23b-4dda-b4e4-e82f0209c94d    7/10/2008 4:33:15 PM
 9  0   d23f393d-0272-445a-8670-3f71b016174e    7/10/2008 4:33:15 PM
10  10  d23f393d-0272-445a-8670-3f71b016174e    7/10/2008 4:33:17 PM
11  1   d23f393d-0272-445a-8670-3f71b016174e    7/10/2008 4:33:17 PM


EvId   Description
----   ----------------
  0 New Call
  1 End of Call
  2 Caller Hangup
 10 CPA Completed

4 个答案:

答案 0 :(得分:1)

这是一个查询,它生成给定时间段内所有事件的日志,以及事件发生时的当前调用次数。它使用多个CTE按逻辑顺序步骤构建所需数据:选择在开始时间之前启动的调用,减去在开始时间之前结束的调用,在开始和结束时间之间添加调用事件。然后,此结果集用于生成事件的时间轴,以及任何事件的当前调用计数。使用CTE只是因为我发现它们比派生表更容易阅读和理解。

declare @temp table (
    EvId int not null
    , CallId uniqueidentifier not null
    , DateTime Datetime not null);

 declare @starttime datetime
    , @endtime datetime;

 select @starttime = '7/10/2008 1:33:14 PM';
 select @endtime = '7/10/2008 1:43:14 PM';

 -- These are all the calls
 -- that started before the start time
 with started_call as (
 select * from call_log 
    where DateTime < @starttime 
    and EvId = 0)
-- These are all the calls 
-- that ended    before the start time
 , ended_call as (
 select * from call_log 
    where DateTime < @starttime 
    and EvId = 1)
-- These are all the call ids 
-- that were ongoing at the start time  
 , existing_calls as (
 select CallId from started_call
 except
 select CallId from ended_call)
-- These are all the call events logged
-- for calls that were were ongoing at the start time   
 , existing_details as (
 select l.* 
    from call_log l
    join existing_calls e on e.CallId = l.CallId
    where l.DateTime < @starttime)
-- these are events that occured
-- between start time and endtime   
, new_events as (
    select * from call_log
    where DateTime between @starttime and @endtime)
-- and these are all the events that are of interest
, all_events as (
    select * from existing_details
    union all
    select * from new_events)
-- put all the interesting events into a @temp table
-- unfortunately QO cannot spool this for us
-- so we better do it isntead   
insert into @temp (EvId, CallId, DateTime)
    select EvId, CallId, DateTime  from all_events;

-- Extract events, along with the count
-- at the time of the event
select e.*,(
        select sum(case
            when EvId = 0 then 1 -- Start call
            when EvId = 1 then -1 -- end call
            else 0 end) -- Other events 
        from @temp se
        where se.DateTime < e.DateTime) as cnt
from @temp e
where DateTime between @starttime and @endtime
order by DateTime;

如果存在适当的索引,此查询将生成一个不扫描整个日志表的计划。它为任何间隔提供正确的结果,并在间隔开始时间考虑现有呼叫。在我对1 mil日志记录的测试中,它在1.5GB RAM单一proc笔记本电脑上以1.1s(628ms产生@temp表,505ms产生时间线和当前计数)始终产生10分钟间隔的事件。如果引入对任何呼叫的最大持续时间的限制,则可以改善对大型表的性能,因为在开始时对现有呼叫的搜索可以在较低端被限制(DatTime&gt; =开始时间 - 呼叫的最大持续时间) )。

使用中间的@temp表变量并不优雅,但效率很高。

以下是示例输出:

EvId    CallId                                  DateTime                cnt
1   401D9E00-040C-4B0E-8864-C66B72CF47AA    2008-07-10 13:33:16.000 23
10  401D9E00-040C-4B0E-8864-C66B72CF47AA    2008-07-10 13:33:16.000 23
1   8BF7AF50-B32C-464A-AF01-FDB653F0517D    2008-07-10 13:33:18.000 22
10  8BF7AF50-B32C-464A-AF01-FDB653F0517D    2008-07-10 13:33:18.000 22
0   CB523E24-5CE2-4E36-9D6C-4AE7BCEB1F53    2008-07-10 13:33:19.000 21
1   4A54EEB6-A899-4167-9D5C-2CE1BC838FFB    2008-07-10 13:33:20.000 22

以下是我创建和加载测试数据的方法。注意表上的聚簇索引和非聚簇索引,它们都是关键的。

create table call_log (id int identity(1,1) not null
    , EvId int not null
    , CallId uniqueidentifier not null
    , DateTime Datetime not null);
create clustered index cdx_call_log on call_log(EvId, DateTime);
create nonclustered index idx_call_log_call_id on call_log(CallId);
go

 set nocount on;
 declare @i int, @date datetime, @callId uniqueidentifier;
 select @i = 0, @date = '7/10/2008 12:33:14 PM';
 begin transaction
 while @i < 1000000
 begin
    declare @duration int,
        @delay int;
    select @duration = rand()*180,
        @delay = rand() * 10;
    select @date = dateadd(second, @delay, @date)
        , @callId = newid();

    insert into call_log (EvId, CallId, DateTime)
    values  (0, @callId, @date)
        , (10, @callId, dateadd(second, @duration, @date))
        , (1, @callId, dateadd(second, @duration, @date));
    select @i = @i + 1;
    if (0 = @i%100)
    begin
        commit;
        begin tran;
    end
 end
 commit
 go

答案 1 :(得分:1)

在使用我的查询示例之前,您需要设置一个“帮助程序”表,每个数据库只需要执行一次这样的操作:

CREATE TABLE Numbers
(Number int  NOT NULL,
    CONSTRAINT PK_Numbers PRIMARY KEY CLUSTERED (Number ASC)WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON) ON [PRIMARY]
) ON [PRIMARY]
DECLARE @x int
SET @x=0
WHILE @x<8000
BEGIN
    SET @x=@x+1
    INSERT INTO Numbers VALUES (@x)
END

这基本上创建了一个包含单个列的表,其中包含1到8000的值。您可以使用CTE执行相同的操作,但由于您没有说SQL Server版本,这将适用于所有人,并且更好如果你将多次运行。

试试这个:

DECLARE @Calls  table (rowID int not null primary key identity(1,1)
                      ,EvId int not null
                      ,CallId varchar(36)
                      ,rowDateTime datetime
                      )
SET NOCOUNT ON
INSERT INTO @Calls VALUES ( 0,'df1cbc93-5cf3-402a-940b-4441f6a7ec5c',' 7/9/2008 8:12:56 PM')
INSERT INTO @Calls VALUES ( 1,'df1cbc93-5cf3-402a-940b-4441f6a7ec5c',' 7/9/2008 8:13:07 PM')
INSERT INTO @Calls VALUES ( 0,'ec1c2078-1765-4377-9126-6f26fe33e4a9','7/10/2008 4:33:10 PM')
INSERT INTO @Calls VALUES (10,'ec1c2078-1765-4377-9126-6f26fe33e4a9','7/10/2008 4:33:13 PM')
INSERT INTO @Calls VALUES ( 1,'ec1c2078-1765-4377-9126-6f26fe33e4a9','7/10/2008 4:33:13 PM')
INSERT INTO @Calls VALUES ( 0,'a3c3b9a0-a23b-4dda-b4e4-e82f0209c94d','7/10/2008 4:33:13 PM')
INSERT INTO @Calls VALUES (10,'a3c3b9a0-a23b-4dda-b4e4-e82f0209c94d','7/10/2008 4:33:15 PM')
INSERT INTO @Calls VALUES ( 1,'a3c3b9a0-a23b-4dda-b4e4-e82f0209c94d','7/10/2008 4:33:15 PM')
INSERT INTO @Calls VALUES ( 0,'d23f393d-0272-445a-8670-3f71b016174e','7/10/2008 4:33:15 PM')
INSERT INTO @Calls VALUES (10,'d23f393d-0272-445a-8670-3f71b016174e','7/10/2008 4:33:17 PM')
INSERT INTO @Calls VALUES ( 1,'d23f393d-0272-445a-8670-3f71b016174e','7/10/2008 4:33:17 PM')
--I added more test data, to hit more cases
INSERT INTO @Calls VALUES ( 0,'111111111111111111111111111111111111','7/10/2008 4:10:00 PM')
INSERT INTO @Calls VALUES (10,'111111111111111111111111111111111111','7/10/2008 4:11:00 PM')
INSERT INTO @Calls VALUES ( 1,'111111111111111111111111111111111111','7/10/2008 4:11:00 PM')
INSERT INTO @Calls VALUES ( 0,'222222222222222222222222222222222222','7/10/2008 4:15:00 PM')
INSERT INTO @Calls VALUES (10,'222222222222222222222222222222222222','7/10/2008 4:16:00 PM')
INSERT INTO @Calls VALUES ( 1,'222222222222222222222222222222222222','7/10/2008 4:16:00 PM')
INSERT INTO @Calls VALUES ( 0,'333333333333333333333333333333333333','7/10/2008 4:09:00 PM')
INSERT INTO @Calls VALUES (10,'333333333333333333333333333333333333','7/10/2008 4:18:00 PM')
INSERT INTO @Calls VALUES ( 1,'333333333333333333333333333333333333','7/10/2008 4:18:00 PM')
INSERT INTO @Calls VALUES ( 0,'444444444444444444444444444444444444','7/10/2008 4:13:00 PM')
INSERT INTO @Calls VALUES (10,'444444444444444444444444444444444444','7/10/2008 4:14:00 PM')
INSERT INTO @Calls VALUES ( 1,'444444444444444444444444444444444444','7/10/2008 4:14:00 PM')
INSERT INTO @Calls VALUES ( 0,'555555555555555555555555555555555555','7/10/2008 4:13:00 PM')
SET NOCOUNT OFF

DECLARE @StartRange  datetime
DECLARE @EndRange    datetime

SET @StartRange='7/10/2008 4:12:00 PM'
SET @EndRange  ='7/10/2008 4:15:00 PM'

SET @EndRange=DATEADD(mi,1,@EndRange)

--this lists the match time and each calls details in progress at that time
SELECT
    DATEADD(mi,n.Number-1,c.StartTime) AS 'TimeOfMatch'
        ,c.CallID
        ,c.StartTime,c.EndTime
    FROM (SELECT --this derived table joins together the start and end dates into a single row, filtering out rows more than 90 minutes before the start range (if calls are longer than 90 minutes, increase this) and filters out any rows after the end date (will consider call done at end date then)
              CallID, MIN(rowDateTime) AS StartTime, CASE  WHEN MAX(rowDateTime)=MIN(rowDateTime) THEN @EndRange ELSE MAX(rowDateTime) END  AS EndTime
              FROM @Calls 
              WHERE rowDateTime>=DATEADD(mi,-90,@StartRange) --AND rowDateTime<=@EndRange
              GROUP BY CallID
         ) c
        INNER JOIN Numbers   n ON DATEDIFF(mi,c.StartTime,c.EndTime)+1>=n.Number
    WHERE DATEADD(mi,n.Number-1,c.StartTime)>=@StartRange AND DATEADD(mi,n.Number-1,c.StartTime)<@EndRange
    ORDER BY 1

--this lists just the match time and the call count
SELECT
    DATEADD(mi,n.Number-1,c.StartTime) AS 'TimeOfMatch'
        ,c.CallID
        ,c.StartTime,c.EndTime
    FROM (SELECT --this derived table joins together the start and end dates into a single row, filtering out rows more than 90 minutes before the start range (if calls are longer than 90 minutes, increase this) and filters out any rows after the end date (will consider call done at end date then)
              CallID, MIN(rowDateTime) AS StartTime, CASE  WHEN MAX(rowDateTime)=MIN(rowDateTime) THEN @EndRange ELSE MAX(rowDateTime) END  AS EndTime
              FROM @Calls 
              WHERE rowDateTime>=DATEADD(mi,-90,@StartRange) --AND rowDateTime<=@EndRange
              GROUP BY CallID
         ) c
        INNER JOIN Numbers   n ON DATEDIFF(mi,c.StartTime,c.EndTime)+1>=n.Number
    WHERE DATEADD(mi,n.Number-1,c.StartTime)>=@StartRange AND DATEADD(mi,n.Number-1,c.StartTime)<@EndRange
    ORDER BY 1

这是输出:

TimeOfMatch             CallID                               StartTime               EndTime
----------------------- ------------------------------------ ----------------------- -----------------------
2008-07-10 16:12:00.000 333333333333333333333333333333333333 2008-07-10 16:09:00.000 2008-07-10 16:18:00.000
2008-07-10 16:13:00.000 333333333333333333333333333333333333 2008-07-10 16:09:00.000 2008-07-10 16:18:00.000
2008-07-10 16:13:00.000 444444444444444444444444444444444444 2008-07-10 16:13:00.000 2008-07-10 16:14:00.000
2008-07-10 16:13:00.000 555555555555555555555555555555555555 2008-07-10 16:13:00.000 2008-07-10 16:16:00.000
2008-07-10 16:14:00.000 555555555555555555555555555555555555 2008-07-10 16:13:00.000 2008-07-10 16:16:00.000
2008-07-10 16:14:00.000 444444444444444444444444444444444444 2008-07-10 16:13:00.000 2008-07-10 16:14:00.000
2008-07-10 16:14:00.000 333333333333333333333333333333333333 2008-07-10 16:09:00.000 2008-07-10 16:18:00.000
2008-07-10 16:15:00.000 333333333333333333333333333333333333 2008-07-10 16:09:00.000 2008-07-10 16:18:00.000
2008-07-10 16:15:00.000 555555555555555555555555555555555555 2008-07-10 16:13:00.000 2008-07-10 16:16:00.000
2008-07-10 16:15:00.000 222222222222222222222222222222222222 2008-07-10 16:15:00.000 2008-07-10 16:16:00.000

(10 row(s) affected)

TimeOfMatch             
----------------------- -----------
2008-07-10 16:12:00.000 1
2008-07-10 16:13:00.000 3
2008-07-10 16:14:00.000 3
2008-07-10 16:15:00.000 3

(4 row(s) affected)

您需要在rowDateTime + CallId上使用复合索引。但是为了获得最佳性能,如果您创建了一个新表(startdate + CallId上的聚簇索引),其中包含单个调用的开始日期和结束日期(当EvId = 0插入时包含开始日期和EvId时可能使用触发器) = 1更新结束日期)然后可以使用此新表删除派生表。

答案 2 :(得分:0)

试试这个:

DECLARE @tblCalls TABLE(ActionEffect int, ActionTime datetime)

INSERT INTO @tblCalls(ActionEffect, ActionTime)
    SELECT 1, [DateTime]
    FROM tblCallRecords
    WHERE EviD = 0

INSERT INTO @tblCalls(ActionEffect, ActionTime)
    SELECT -1, [DateTime]
    FROM tblCallRecords
    WHERE EvID > 0

(我假设EvID不是0表示通话结束?)

然后,要在任何给定时刻获得通话次数,请执行以下操作:

SELECT Sum(ActionEffect)
FROM @tblCalls
WHERE ActionTime < @GivenMoment

不过,有1500万条记录不太好。

现在,如果你想要总计这个,你可能需要做这样的事情:

SELECT a.ActionTime, Sum(b.ActionEffect) AS OpenCalls
FROM @tblCalls AS a
LEFT JOIN @tblCalls AS b ON a.ActionTime > b.ActionTime
GROUP BY a.ActionTime

迅速变大。我想我会运行一次,将结果存储在一个表中并修改我的呼叫记录机制代码,以便在呼叫进入时动态更新它。

答案 3 :(得分:0)

这不是一个解决方案,只是抛出一些想法。没有对此进行测试,如果有垃圾,请随时将它们击落。

这种假设有两件事

1)DateTime和UniqueID有一个索引

2)电话的持续时间不会超过一定的时间(例如24小时或48小时),如果这样做,则可以忽略。

如果没有,那么你可能会停止阅读。

如果是,如果你开始查询类似

的查询
 Select CallId, 
     Min(DateTime) as StartOfCall , Max(DateTime) as EndofCall        
 from Call_log
 where
    (evid = 0 or evid=1)
 and DateTime between @ExtendedStartPeriod and @ExtendedEndPeriod

其中,ExtendedStartPeriod和ExtendedEndPeriod是实际期间的前一天和后一天(如果最长通话时间为48小时,则为两天)

这将为您提供一些您不想要的记录,因此您需要进一步查询以删除这些

Select UniqueID from (...) table1
where StartOfCall <= @EndDate or EndOfCall >= @StartDate

这应该(我认为)排除在完成期后开始的电话或在开始日期之前结束的电话。

接下来,我们执行另一个外部查询

Select DateTime, 
  CallChange = Case 
  When Evid = 0 then 1
  When Evid = 1 then -1
  else 0
 end
 from call_log 
 where 
  unique_id in ( ... )  
  and (evid = 0 or evid=1)
 and DateTime between @ExtendedStartPeriod and @ExtendedEndPeriod 

这应该会为您提供事件发生时间列表以及是否增加或减少呼叫次数。 在您的示例中,类似

         7/9/2008 8:12:56 PM  1
         7/9/2008 8:13:07 PM -1
        7/10/2008 4:33:10 PM  1
        7/10/2008 4:33:13 PM -1
        7/10/2008 4:33:13 PM  1
        7/10/2008 4:33:15 PM -1
        7/10/2008 4:33:15 PM  1
        7/10/2008 4:33:17 PM -1

如果每秒的调用量非常大,可能需要按分钟对此进行分组,以减少从sql返回的数据的大小。

甚至可以进行进一步的查询

Select 
   Count(CallChange) ,
   DatePart("yyyy", DateTime) , 
   DatePart("mm", DateTime),
   DatePart("dd", DateTime),
   DatePart("hh", DateTime),
   DatePart("mi", DateTime)
   DatePart("ss", DateTime)
From
   ( ...) 

  Group By
     DatePart("yyyy", DateTime) , 
     DatePart("mm", DateTime),
     DatePart("dd", DateTime),
     DatePart("hh", DateTime),
     DatePart("mi", DateTime)
     DatePart("ss", DateTime)

就像我可以使用Sql一样,也许有人可以接受它,否则我认为需要做一些C#以保持每个时期的事务的运行计数。