我所拥有的是几个用户ID,并且每隔一小时就会看到一次时间。我想计算每个用户花费的时间总和(正常运行时间),但不包括延迟60分钟的每个数据点,这意味着要避免比上一个延迟120分钟以上的时间戳。并最终从时间戳本身提取的一天进行分组。我还将连接频率称为“断开连接”,以防在数据中看到超过2个小时或更长时间的间隔,并将计数加1。这就是连接频率。请记住,查询是针对BigQuery编写的。
unique_id server_time
50J181700696 2019-07-02 00:14:14.157 UTC
50J181700696 2019-07-02 01:14:14.136 UTC
50J181700696 2019-07-02 02:14:14.116 UTC
50J181700696 2019-07-02 04:14:14.065 UTC
50J181700696 2019-07-02 05:14:14.041 UTC
50J181700696 2019-07-02 07:14:13.987 UTC
50J181700696 2019-07-02 08:14:13.961 UTC
50J181700696 2019-07-02 11:14:13.873 UTC
50J181700696 2019-07-02 12:14:13.852 UTC
50J181700696 2019-07-02 13:14:13.822 UTC
SELECT
date_column,
unique_id,
SUM(
case TIMESTAMP_DIFF(prev_server_time,server_time,minute) between 0 and 120
when server_time is null or prev_server_time is null then 0
when server_time > prev_server_time then TIMESTAMP_DIFF(server_time,prev_server_time,minute)
else 0
END
) AS uptime_per_day,
SUM(
case not (TIMESTAMP_DIFF(prev_server_time,server_time,minute) between 0 and 120 )
when prev_server_time is null or server_time is null then 0
when server_time > prev_server_time and TIMESTAMP_DIFF(server_time,prev_server_time,minute) between 120 and 1440 then 1
else 0
END
) AS connection_times
FROM (
SELECT
date_column,
unique_id,
server_time,
LAG(server_time ) OVER (PARTITION BY unique_id ORDER BY date_column ) AS prev_server_time
FROM (
SELECT
unique_id,
server_time,
DATE(server_time) AS date_column
FROM
`table_user_entry`
))
GROUP BY
date_column,
unique_id
date_column unique_id uptime_per_day(minutes) connection_times
2019-07-02 50J181700696 420 3
答案 0 :(得分:0)
这些是我运行查询时得到的结果:
http://resources.cumulocity.com/maven/repository
uptime_per_day为0,因为TIMESTAMP_DIFF(prev_server_time,server_time,minute) between 0 and 120
始终为False,因为时间戳记差异始终为负。您必须交换服务器时间的顺序才能具有正值:TIMESTAMP_DIFF(server_time,prev_server_time,minute)
使用LAG函数LAG(server_time ) OVER (PARTITION BY unique_id ORDER BY date_column ) AS prev_server_time
时,请按server_time(而不是date_column)对数据进行排序。这样可以确保您确实与上一行进行了比较,并且不需要进行server_time > prev_server_time
这是一个没有汇总的查询,以了解如何计算时间。
SELECT
date_column,
unique_id,
IF(time_diff between 0 and 120, time_diff, 0) as up_time,
IF(time_diff IS NULL OR time_diff between 120 and 1440, 1, 0) as connection_started
FROM (
SELECT
DATE(server_time) as date_column,
unique_id,
server_time,
prev_server_time,
TIMESTAMP_DIFF(server_time,prev_server_time,minute) AS time_diff
FROM (
SELECT
unique_id,
server_time,
LAG(server_time ) OVER (PARTITION BY unique_id ORDER BY server_time ) AS prev_server_time
FROM
`table_user_entry`
))
还有汇总的最终结果。
WITH connection_data as (SELECT
date_column,
unique_id,
IF(time_diff between 0 and 120, time_diff, 0) as uptime,
IF(time_diff IS NULL OR time_diff between 120 and 1440, 1, 0) as connection_started
FROM (
SELECT
DATE(server_time) as date_column,
unique_id,
server_time,
prev_server_time,
TIMESTAMP_DIFF(server_time,prev_server_time,minute) AS time_diff
FROM (
SELECT
unique_id,
server_time,
LAG(server_time ) OVER (PARTITION BY unique_id ORDER BY server_time ) AS prev_server_time
FROM
`table_user_entry`
))
)
SELECT date_column, unique_id, SUM(uptime) as uptime, SUM(connection_started) as connection_times
FROM connection_data
GROUP BY date_column, unique_id
这些结果可能与您预期的不同。请注意,如果服务器时间之间存在120分钟的差异,则这些差异严格小于120分钟,因此您可能需要根据完整的工作方案进行调整。