我有一张表Statistics
,其中包含用户与我网站上的对象互动的信息。表格的结构如下:
id | object_id | user_id | interaction_time | interaction_type
----+-----------+---------+---------------------+------------------
1 | 1 | 1 | 2015-07-08 12:00:00 | opened
2 | 1 | 2 | 2015-07-08 12:10:00 | opened
3 | 1 | 1 | 2015-07-08 12:15:00 | closed
4 | 2 | 3 | 2015-07-08 12:16:00 | opened
5 | 1 | 2 | 2015-07-08 12:17:00 | closed
user_id=1
在object_id=1
打开2015-07-08 12:00:00
并在2015-07-08 12:15:00
关闭了user_id=2
,object_id=1
在2015-07-08 12:10:00
打开了2015-07-08 12:17:00
,在object_id=1
关闭了它。
我想得到的是每个对象的平均互动持续时间,即(15 minutes (user_id=1) + 7 minutes (user_id=2))/2 = 11 minutes
应该是closed
。
我可以在不创建其他表的情况下执行此操作吗?
请注意,opened
在opened
或opened
之前出现时可能出现故障,等等。在这种情况下,我们应该只计算连续的closed
和{{1} }。
答案 0 :(得分:1)
执行此操作的一种方法是使用cross apply
查找每个打开后的关闭行:
select
s.object_id,
avg_duration = avg(datediff(minute, s.interaction_time, o.interaction_time))
from [Statistics] s
cross apply (
select top 1 * from [Statistics]
where s.object_id = object_id
and s.user_id = user_id
and s.interaction_time < interaction_time
and interaction_type = 'closed'
order by interaction_time
) o
where s.interaction_type = 'opened'
group by s.object_id
请注意,平均值计算在整数上执行时会失去分数,因此如果您想要更高的精度,则可能需要使用
avg(datediff(minute, s.interaction_time, o.interaction_time) * 1.0)
强制浮点计算,如果需要可以舍入结果。
使用(object_id, user_id, interaction_time)
索引我相信这应该可以正常运行(并且可能比其他工作解决方案更好)。
答案 1 :(得分:0)
这是一种方法。但是,如果每个用户和对象有多个开始/结束,那么这会变得有点复杂,因为您必须根据该用户和对象的最小交互时间内部连接到统计信息的子集,该用户和对象比一个用户和对象更大。在已打开的清单中评估。
所以这里的平均版本不那么复杂......但如果每个用户/对象有多个打开/关闭,那么我们必须将自连接修改为子集。
SELECT object_Id, avg(DATEDIFF(minute, O.interaction_Time, C.Interaction_Time))
FROM statistics O
INNER JOIN Statistics C
on O.Object_Id = C.Object_Id
and O.user_ID = C.user_Id
and O.Interaction_type = 'opened'
and C.InteractioN_type = 'closed'
GROUP BY OBJECT_ID
这利用相关子查询来标识用户对象的下一个关闭记录/时间。由于我们在此处使用内连接,因此忽略任何具有打开但没有关闭的对象。任何具有close但没有打开的对象也会被忽略。 任何具有TWO的对象打开后跟一个关闭...已打开BOTH对相同的收盘价进行评估,以确定计算平均值时使用的持续时间。如果这是不希望的行为,我们可以修改coorlated子查询只查看每个有效的配对。我只需要考虑更多。
SELECT object_Id, avg(DATEDIFF(minute, O.interaction_Time, C.Interaction_Time))
FROM statistics O
INNER JOIN
(SELECT object_Id, user_Id, min(interaction_Time) interaction_Time,
FROM statistics where interaction_Type = 'closed'
and interaction_Time > O.Interaction_Time
group by object_Id, user_ID) C
on O.Object_Id = C.Object_Id
and O.user_ID = C.user_Id
and O.Interaction_type = 'opened'
GROUP BY OBJECT_ID
答案 2 :(得分:0)
你可以在下面这样做:
SELECT [object_id], AVG(DATEDIFF(MINUTE, minTime, maxTime))
FROM (
SELECT [object_id], min(interaction_Time) minTime, max(Interaction_Time) maxTime, [user_id]
FROM #Test
GROUP BY [object_id], [user_id]
)x
GROUP BY [object_id]
答案 3 :(得分:0)
为了处理连续的打开/关闭行,我们可以将Statistics表连接到自身以进行打开/关闭对。有必要检查关闭的行是否在打开的行之后,并且该对的元素之间不存在同一对象和用户的其他行。
一旦我们得到了有效对的列表,从对象的持续时间中获取对象的平均交互时间就是分组和聚合。
SELECT o.object_id, AVG(DATEDIFF(MINUTE, o.interaction_time, c.interaction_time))
FROM [Statistics] o
JOIN [Statistics] c
ON o.object_id = c.object_id AND o.user_id = c.user_id
AND o.interaction_type = 'opened' AND c.interaction_type = 'closed'
AND o.interaction_time < c.interaction_time
AND NOT EXISTS (
SELECT 1 FROM [Statistics] m
WHERE o.object_id = m.object_id AND o.user_id = m.user_id
AND m.id > o.id AND m.id < c.id
)
GROUP BY o.object_id
答案 4 :(得分:0)
您可以借助CTE帮助进行“有序”的自我加入。这有助于保证没有松散的结束,只选择连续的行。
WITH cteRN(object_id, user_id, itime, itype, RN) AS (
SELECT object_id, user_id, interaction_time, interaction_type,
ROW_NUMBER() OVER(PARTITION BY object_id, user_id ORDER BY interaction_time)
FROM Interactions
)
SELECT cls.object_id, AVG(DATEDIFF(minute, opn.itime, cls.itime)) average_time
FROM cteRN cls INNER JOIN cteRN opn
ON cls.object_id = opn.object_id AND cls.user_id = opn.user_id AND cls.RN = opn.RN + 1
WHERE cls.itype = 'closed' AND opn.itype = 'opened'
GROUP BY cls.object_id
这是一个有效的fiddle