目标:
我正在尝试编写一个查询,它可以获取用户在会话过程中点击的页数。我想比较Min(时间)和Max(时间)......等等......如果任何两行之间的差异是> 20分钟我想停止计算分钟数,并将这个差异用作总数。例如:
用户1 - 第1页 - 上午10:00
用户1 - 第2页 - 上午10:10
用户1 - 第3页 - 上午10:40
我希望用户1的结果为10分钟。 (因为我正在对UserID进行分组)该用户每隔10分钟持续点击页面10个小时。然后我想要的结果是一样的。但是当有20分钟的差距时,停下来,计算一下,继续前进。
澄清 - 用户CAN的会话时间超过20分钟。如果他们每15分钟点击一小时 - 我想要捕获的会话是60分钟。如果他们在15分钟内再点击一件事,那么" AFK"在回来之前的几个小时...我想要记录会话长度的75分钟。
问题:
然而,我的公司IIS日志除了页面被点击的时间之外没有给我任何东西。我们决定缺少更好的选项,我们将使用页面被击中的最小和最大时间之间的差异来确定“MinutesForSession”。问题是,一些用户,比我们预期的要多得多,会登录AM,然后再次登录PM ...导致最小/最大偏差超过10小时。当知道平均用户会话几乎不会那么长时,该数据会偏离我们试图找到的平均值。
我的努力:
我创建了一个名为#JulyStats的临时表,它是我的生产数据的镜像。我正在选择UserID,最小/最大时间之间的差异。 (时间定义为用户点击页面的时刻),以及他们点击的页数。我从日志中排除了一些脚本,字体和图像,并且由于大小想要将它隔离到一天。
SELECT #JulyStats.UserID, DATEDIFF(MINUTE, MIN(#JulyStats.time), MAX(#JulyStats.time)) AS MinutesForSession,COUNT(*) AS CountPages
FROM #JulyStats
WHERE #JulyStats.date = '7/28/2014'
AND URL NOT LIKE '%js%'
AND URL NOT LIKE '%css%'
AND URL NOT LIKE '%jpg%'
AND URL NOT LIKE '%gif%'
AND URL NOT LIKE '%ico%'
AND URL NOT LIKE '%png%'
AND URL NOT LIKE '%/KeepAlive%'
AND URL NOT LIKE '%font%'
AND URL NOT LIKE '%axd%'
AND URL NOT LIKE '%htc%'
GROUP BY #JulyStats.UserID
ORDER BY MinutesForSession DESC
为了尝试自己解决这个问题,我试图添加一个Where子句:
AND DATEDIFF(MINUTE, MIN(#JulyStats.time), #JulyStats.time) < 20
这导致了一个引用聚合的错误,而我的sql知识让我对在哪里正确声明它感到困惑。
我需要什么:
我不确定当满足20分钟的条件时,如何让它停止计算分钟总数。因此,无论是编写此查询的一些帮助,还是我想要搜索互联网所需的关键字/短语的概念,都可以用来描述我想在这里完成的工作。非常感谢。
答案 0 :(得分:1)
每次有20分钟或更长的间隔时,您可以通过创建标记来执行此操作。然后计算每行标志为真的次数。然后,同一“组”中的所有内容都将具有相同的计数。
我假设您使用的是SQL Server 2012+,因此您可以访问lag()
和累积和功能。这是一种方便,您可以使用早期版本的SQL Server执行此操作。
with jst as (
SELECT js.*,
(case when datediff(minute,
LAG(time) over (partition by UserId order by time),
time
) < 20
then 0 else 1
end) as StartFlag
FROM #JulyStats js
WHERE js.date = '7/28/2014' AND URL NOT LIKE '%js%' AND URL NOT LIKE '%css%' AND
URL NOT LIKE '%jpg%' AND URL NOT LIKE '%gif%' AND URL NOT LIKE '%ico%' AND
URL NOT LIKE '%png%' AND URL NOT LIKE '%/KeepAlive%' AND URL NOT LIKE '%font%' AND
URL NOT LIKE '%axd%' AND URL NOT LIKE '%htc%'
)
select jst.UserId, jst.grp, min(jst.time), max(jst.time), count(*) as NumSessionPages
from (select jst.*, sum(StartFlag) over (partition by UserId order by time) as grp
from jst
) jst
group by jst.UserId, jst.grp;
答案 1 :(得分:0)
更新:
感谢您澄清PWilliams0530。在这种情况下,戈登的回答看起来不错。我这里有一个不需要LAG的版本:
--create temp testing table
create table #stats(userid int not null, timestamp datetime2(6))
--get lots of test data
insert into #stats(userid, timestamp)
select abs(cast(newid() as binary(6)) % 1000) + 1, dateadd(minute, abs(cast(newid() as binary(6)) % 1000) + 1, getdate())
from sys.objects o1, sys.objects o2
--query
select userid, datediff(minute, min(timestamp), max(timestamp)) session_minutes, count(*) count
from (
select *,
--get the cumulative sum of the flags by user. Note this means each 20 min marker for the user is now unique
sum(grp) over(partition by userid order by timestamp) cumul_grp
from (
select userid, timestamp,
--if difference between the prev timestamp and this timestamp is >= 20, flag it
case
when coalesce(datediff(minute, prev_stamp, timestamp), 0) >= 20 then 1
else 0
end grp
from (
select s.userid, s.timestamp,
(
select max(s2.timestamp) --get largest time less than the outside query; so this is the next "click"
from #stats s2
where s2.userid = s.userid
and s2.timestamp < s.timestamp
) prev_stamp
from #stats s
) x
) y
) z
--group by the userid and the cumulative flag --- this ensures we only take the
--time differences with no gaps >= 20 min bc the flag segments the data for us
group by userid, cumul_grp
order by 2 desc
对于Gordon的回答,您可能在SQL Server 2012上有一个数据库,而不是兼容级别110.
如果你
Alter database set compatibility_level = 110;
然后它将允许您使用SQL Server 2012的高级功能。
老答案:
看起来你只想在每个会话的最大分钟数为20.这里有2个可能对你有帮助的查询:
--keep max minutes for session at 20
SELECT #JulyStats.UserID,
case
when DATEDIFF(MINUTE, MIN(#JulyStats.time), MAX(#JulyStats.time)) >= 20 then 20
else DATEDIFF(MINUTE, MIN(#JulyStats.time), MAX(#JulyStats.time))
end AS MinutesForSession,
COUNT(*) AS CountPages
FROM #JulyStats
WHERE #JulyStats.date = '7/28/2014'
AND URL NOT LIKE '%js%'
AND URL NOT LIKE '%css%'
AND URL NOT LIKE '%jpg%'
AND URL NOT LIKE '%gif%'
AND URL NOT LIKE '%ico%'
AND URL NOT LIKE '%png%'
AND URL NOT LIKE '%/KeepAlive%'
AND URL NOT LIKE '%font%'
AND URL NOT LIKE '%axd%'
AND URL NOT LIKE '%htc%'
GROUP BY #JulyStats.UserID
ORDER BY MinutesForSession DESC
--only return users with 20 or less minutes per session
SELECT #JulyStats.UserID, DATEDIFF(MINUTE, MIN(#JulyStats.time), MAX(#JulyStats.time)) AS MinutesForSession,
COUNT(*) AS CountPages
FROM #JulyStats
WHERE #JulyStats.date = '7/28/2014'
AND URL NOT LIKE '%js%'
AND URL NOT LIKE '%css%'
AND URL NOT LIKE '%jpg%'
AND URL NOT LIKE '%gif%'
AND URL NOT LIKE '%ico%'
AND URL NOT LIKE '%png%'
AND URL NOT LIKE '%/KeepAlive%'
AND URL NOT LIKE '%font%'
AND URL NOT LIKE '%axd%'
AND URL NOT LIKE '%htc%'
GROUP BY #JulyStats.UserID
HAVING DATEDIFF(MINUTE, MIN(#JulyStats.time), MAX(#JulyStats.time)) < 20
ORDER BY MinutesForSession DESC