我正在努力寻找一个月中花费最多时间的用户,一年中的每个月
我正在使用以下数据
uid activity-time status
... ................... ........
1 2016-12-31 16:00:04 sign in
1 2016-12-31 21:05:37 sign out
2 2016-12-25 18:00:04 sign in
2 2016-12-25 20:45:31 sign out
7 2016-10-31 13:00:04 sign in
7 2016-10-31 16:05:30 sign out
1 2016-12-27 17:00:04 sign in
1 2016-12-27 19:05:00 sign out
2 2016-10-25 18:00:04 sign in
2 2016-10-25 20:45:31 sign out
4 2017-12-31 16:00:04 sign in
4 2017-12-31 21:05:37 sign out
3 2017-12-25 18:00:04 sign in
3 2017-12-25 20:45:31 sign out
7 2017-10-31 16:00:04 sign in
7 2017-10-31 21:05:37 sign out
3 2017-10-25 18:00:04 sign in
3 2017-10-25 20:45:31 sign out
我期待以下输出
uid year month time-spent
...... ..... ..... ..........
1 2016 12 07:10:45
7 2016 10 03:05:34
4 2017 12 05:05:41
7 2017 10 05:05:41
我尝试过以下查询,但我不知道如何指定登录和注销的条件
SELECT ETS.*
FROM (SELECT year(activity-time),month(activity-time), uid, count(uid) as c,
ROW_NUMBER() OVER (PARTITION BY month(activity-time) ORDER BY COUNT(uid) DESC) as seq
FROM activity_table
GROUP BY month(activity-time),year(activity-time), uid
) ds
WHERE seq = 1
ORDER BY c DESC ;
答案 0 :(得分:0)
您可以使用lag
的嵌套查询来获取登录和退出记录之间的时差。
我没有hiveql,所以我可能会关闭一些特定的日期/时间函数,但想法是:
select yr,
mnth,
uid,
from_unixtime(spent, 'hh:mm:ss') spent
from (
select year(activity_time) yr,
month(activity_time) mnth,
uid,
sum(spent) spent,
row_number() over (partition by year(activity_time), month(activity_time)
order by sum(spent) desc) rn
from (
select uid,
activity_time,
status,
unix_timestamp(activity_time)
- lag(unix_timestamp(activity_time))
over (partition by uid order by activity_time) spent
from activity_table
) base
where status = 'sign out'
group by year(activity_time),
month(activity_time),
uid
) grouped
where rn = 1;
注意:我建议不要在列名中使用连字符,而是使用下划线(我在上面的SQL中做过)。
答案 1 :(得分:0)
这是在SQL Server中,但应该给你一个想法。我首先创建了一个CTE,它将计算从时间开始的总秒数,以便我可以使用SUM - 按ID,MM-yyyy日期分组并在之后再次将其转换为时间格式。然后使用row_number获取每个日期的最大值。
;WITH activity_table_seconds
AS (SELECT [uid],
[activity-time],
( Datepart(hour, [activity-time]) * 60 * 60 ) + (
Datepart(minute, [activity-time]) * 60 ) +
Datepart(second, [activity-time]) AS
[activity-time-seconds],
[status]
FROM @activity_table)
SELECT [uid],
[date],
[activity-time]
FROM (SELECT *,
Row_number ()
OVER (
partition BY [date]
ORDER BY [activity-time] DESC) rn
FROM (SELECT a.[uid],
Format(a.[activity-time], 'MM-yyyy') AS [date],
CONVERT(VARCHAR(8),
Dateadd(second, Sum(b.[activity-time-seconds] -
a.[activity-time-seconds]), 0),
108) AS [activity-time]
FROM (SELECT *
FROM activity_table_seconds
WHERE [status] = 'sign in') a
INNER JOIN (SELECT *
FROM activity_table_seconds
WHERE [status] = 'sign out') b
ON a.[uid] = b.[uid]
AND Cast(a.[activity-time] AS DATE) = Cast(
b.[activity-time] AS DATE)
GROUP BY a.[uid],
Format(a.[activity-time], 'MM-yyyy')) a) b
WHERE b.rn = 1