在SQL中满足条件后停止计数(*)

时间:2014-08-01 16:46:48

标签: sql sql-server

目标:

我正在尝试编写一个查询,它可以获取用户在会话过程中点击的页数。我想比较Min(时间)和Max(时间)......等等......如果任何两行之间的差异是> 20分钟我想停止计算分钟数,并将这个差异用作总数。例如:

用户1 - 第1页 - 上午10:00

用户1 - 第2页 - 上午10:10

用户1 - 第3页 - 上午10:40

我希望用户1的结果为10分钟。 (因为我正在对UserID进行分组)该用户每隔10分钟持续点击页面10个小时。然后我想要的结果是一样的。但是当有20分钟的差距时,停下来,计算一下,继续前进。

澄清 - 用户CAN的会话时间超过20分钟。如果他们每15分钟点击一小时 - 我想要捕获的会话是60分钟。如果他们在15分钟内再点击一件事,那么" AFK"在回来之前的几个小时...我想要记录会话长度的75分钟。

问题:

然而,我的公司IIS日志除了页面被点击的时间之外没有给我任何东西。我们决定缺少更好的选项,我们将使用页面被击中的最小和最大时间之间的差异来确定“MinutesForSession”。问题是,一些用户,比我们预期的要多得多,会登录AM,然后再次登录PM ...导致最小/最大偏差超过10小时。当知道平均用户会话几乎不会那么长时,该数据会偏离我们试图找到的平均值。

我的努力:

我创建了一个名为#JulyStats的临时表,它是我的生产数据的镜像。我正在选择UserID,最小/最大时间之间的差异。 (时间定义为用户点击页面的时刻),以及他们点击的页数。我从日志中排除了一些脚本,字体和图像,并且由于大小想要将它隔离到一天。

SELECT #JulyStats.UserID, DATEDIFF(MINUTE, MIN(#JulyStats.time), MAX(#JulyStats.time)) AS MinutesForSession,COUNT(*) AS CountPages
FROM #JulyStats
WHERE #JulyStats.date = '7/28/2014'
AND URL NOT LIKE '%js%'
AND URL NOT LIKE '%css%'
AND URL NOT LIKE '%jpg%'
AND URL NOT LIKE '%gif%'
AND URL NOT LIKE '%ico%'
AND URL NOT LIKE '%png%'
AND URL NOT LIKE '%/KeepAlive%'
AND URL NOT LIKE '%font%'
AND URL NOT LIKE '%axd%'
AND URL NOT LIKE '%htc%'
GROUP BY #JulyStats.UserID
ORDER BY MinutesForSession DESC

为了尝试自己解决这个问题,我试图添加一个Where子句:

AND DATEDIFF(MINUTE, MIN(#JulyStats.time), #JulyStats.time) < 20

这导致了一个引用聚合的错误,而我的sql知识让我对在哪里正确声明它感到困惑。

我需要什么:

我不确定当满足20分钟的条件时,如何让它停止计算分钟总数。因此,无论是编写此查询的一些帮助,还是我想要搜索互联网所需的关键字/短语的概念,都可以用来描述我想在这里完成的工作。非常感谢。

2 个答案:

答案 0 :(得分:1)

每次有20分钟或更长的间隔时,您可以通过创建标记来执行此操作。然后计算每行标志为真的次数。然后,同一“组”中的所有内容都将具有相同的计数。

我假设您使用的是SQL Server 2012+,因此您可以访问lag()和累积和功能。这是一种方便,您可以使用早期版本的SQL Server执行此操作。

with jst as (
      SELECT js.*,
             (case when datediff(minute,
                                 LAG(time) over (partition by UserId order by time),
                                 time
                                ) < 20
                   then 0 else 1
              end) as StartFlag
      FROM #JulyStats js
      WHERE js.date = '7/28/2014' AND URL NOT LIKE '%js%' AND URL NOT LIKE '%css%' AND
           URL NOT LIKE '%jpg%' AND URL NOT LIKE '%gif%' AND URL NOT LIKE '%ico%' AND
           URL NOT LIKE '%png%' AND URL NOT LIKE '%/KeepAlive%' AND URL NOT LIKE '%font%' AND
           URL NOT LIKE '%axd%' AND URL NOT LIKE '%htc%'
    )
select jst.UserId, jst.grp, min(jst.time), max(jst.time), count(*) as NumSessionPages
from (select jst.*, sum(StartFlag) over (partition by UserId order by time) as grp
      from jst
     ) jst
group by jst.UserId, jst.grp;

答案 1 :(得分:0)

更新:

感谢您澄清PWilliams0530。在这种情况下,戈登的回答看起来不错。我这里有一个不需要LAG的版本:

--create temp testing table
create table #stats(userid int not null, timestamp datetime2(6))

--get lots of test data 
insert into #stats(userid, timestamp)
select abs(cast(newid() as binary(6)) % 1000) + 1, dateadd(minute, abs(cast(newid() as binary(6)) % 1000) + 1, getdate())
from sys.objects o1, sys.objects o2

--query 
select userid, datediff(minute, min(timestamp), max(timestamp)) session_minutes, count(*) count
from (
    select *, 
        --get the cumulative sum of the flags by user. Note this means each 20 min marker for the user is now unique 
        sum(grp) over(partition by userid order by timestamp) cumul_grp 
    from (
        select userid, timestamp, 
            --if difference between the prev timestamp and this timestamp is >= 20, flag it 
            case 
            when coalesce(datediff(minute, prev_stamp, timestamp), 0) >= 20 then 1
            else 0
            end grp
        from (
            select s.userid, s.timestamp, 
                (
                    select max(s2.timestamp) --get largest time less than the outside query; so this is the next "click" 
                    from #stats s2 
                    where s2.userid = s.userid
                    and s2.timestamp < s.timestamp
                ) prev_stamp
            from #stats s 
        ) x 
    ) y 
) z
--group by the userid and the cumulative flag --- this ensures we only take the 
--time differences with no gaps >= 20 min bc the flag segments the data for us 
group by userid, cumul_grp 
order by 2 desc 

对于Gordon的回答,您可能在SQL Server 2012上有一个数据库,而不是兼容级别110.

如果你

Alter database set compatibility_level = 110; 

然后它将允许您使用SQL Server 2012的高级功能。

老答案:

  

看起来你只想在每个会话的最大分钟数为20.这里有2个可能对你有帮助的查询:

--keep max minutes for session at 20
SELECT #JulyStats.UserID, 
    case 
    when DATEDIFF(MINUTE, MIN(#JulyStats.time), MAX(#JulyStats.time)) >= 20 then 20
    else DATEDIFF(MINUTE, MIN(#JulyStats.time), MAX(#JulyStats.time)) 
    end AS MinutesForSession,
    COUNT(*) AS CountPages
FROM #JulyStats
WHERE #JulyStats.date = '7/28/2014'
AND URL NOT LIKE '%js%'
AND URL NOT LIKE '%css%'
AND URL NOT LIKE '%jpg%'
AND URL NOT LIKE '%gif%'
AND URL NOT LIKE '%ico%'
AND URL NOT LIKE '%png%'
AND URL NOT LIKE '%/KeepAlive%'
AND URL NOT LIKE '%font%'
AND URL NOT LIKE '%axd%'
AND URL NOT LIKE '%htc%'
GROUP BY #JulyStats.UserID
ORDER BY MinutesForSession DESC

--only return users with 20 or less minutes per session 
SELECT #JulyStats.UserID, DATEDIFF(MINUTE, MIN(#JulyStats.time), MAX(#JulyStats.time)) AS MinutesForSession,
    COUNT(*) AS CountPages
FROM #JulyStats
WHERE #JulyStats.date = '7/28/2014'
AND URL NOT LIKE '%js%'
AND URL NOT LIKE '%css%'
AND URL NOT LIKE '%jpg%'
AND URL NOT LIKE '%gif%'
AND URL NOT LIKE '%ico%'
AND URL NOT LIKE '%png%'
AND URL NOT LIKE '%/KeepAlive%'
AND URL NOT LIKE '%font%'
AND URL NOT LIKE '%axd%'
AND URL NOT LIKE '%htc%'
GROUP BY #JulyStats.UserID
HAVING DATEDIFF(MINUTE, MIN(#JulyStats.time), MAX(#JulyStats.time)) < 20 
ORDER BY MinutesForSession DESC