通过UNIQUE ID计算在网站组上花费的总时间

时间:2019-07-02 13:35:07

标签: sql google-bigquery

我所拥有的是几个用户ID,并且每隔一小时就会看到一次时间。我想计算每个用户花费的时间总和(正常运行时间),但不包括延迟60分钟的每个数据点,这意味着要避免比上一个延迟120分钟以上的时间戳。并最终从时间戳本身提取的一天进行分组。我还将连接频率称为“断开连接”,以防在数据中看到超过2个小时或更长时间的间隔,并将计数加1。这就是连接频率。请记住,查询是针对BigQuery编写的。

  1. 120分钟至1440分钟之间的时间间隔(2小时至24小时被视为数据之间的时间间隔,必须从时间花费计算的总和中忽略掉,但连接时间应增加1,因为它将被视为断开连接)< / li>
    unique_id         server_time
    50J181700696    2019-07-02 00:14:14.157 UTC
    50J181700696    2019-07-02 01:14:14.136 UTC
    50J181700696    2019-07-02 02:14:14.116 UTC
    50J181700696    2019-07-02 04:14:14.065 UTC
    50J181700696    2019-07-02 05:14:14.041 UTC
    50J181700696    2019-07-02 07:14:13.987 UTC
    50J181700696    2019-07-02 08:14:13.961 UTC
    50J181700696    2019-07-02 11:14:13.873 UTC
    50J181700696    2019-07-02 12:14:13.852 UTC
    50J181700696    2019-07-02 13:14:13.822 UTC
    SELECT
      date_column,
      unique_id,
      SUM(
      case TIMESTAMP_DIFF(prev_server_time,server_time,minute) between 0 and 120
            when server_time is null or prev_server_time is null then 0
            when server_time > prev_server_time then TIMESTAMP_DIFF(server_time,prev_server_time,minute)
            else 0 
           END
      ) AS uptime_per_day,
      SUM(
      case not (TIMESTAMP_DIFF(prev_server_time,server_time,minute) between 0 and 120 )
            when prev_server_time is null or server_time is null then 0
            when server_time > prev_server_time and TIMESTAMP_DIFF(server_time,prev_server_time,minute) between 120 and 1440 then 1
            else 0 
           END
      ) AS connection_times
    FROM (
      SELECT
        date_column,
        unique_id,
        server_time,
        LAG(server_time ) OVER (PARTITION BY unique_id ORDER BY date_column   ) AS prev_server_time
      FROM (
        SELECT
          unique_id,
          server_time,
          DATE(server_time) AS date_column
        FROM
          `table_user_entry`
        ))
    GROUP BY
      date_column,
      unique_id
date_column unique_id      uptime_per_day(minutes) connection_times
2019-07-02  50J181700696      420                       3

1 个答案:

答案 0 :(得分:0)

这些是我运行查询时得到的结果:

http://resources.cumulocity.com/maven/repository

uptime_per_day为0,因为TIMESTAMP_DIFF(prev_server_time,server_time,minute) between 0 and 120始终为False,因为时间戳记差异始终为负。您必须交换服务器时间的顺序才能具有正值:TIMESTAMP_DIFF(server_time,prev_server_time,minute)

使用LAG函数LAG(server_time ) OVER (PARTITION BY unique_id ORDER BY date_column ) AS prev_server_time时,请按server_time(而不是date_column)对数据进行排序。这样可以确保您确实与上一行进行了比较,并且不需要进行server_time > prev_server_time

之类的验证。

这是一个没有汇总的查询,以了解如何计算时间。

SELECT
      date_column,
      unique_id,
      IF(time_diff between 0 and 120, time_diff, 0) as up_time,
      IF(time_diff IS NULL OR time_diff between 120 and 1440, 1, 0) as connection_started

    FROM (
      SELECT
        DATE(server_time) as date_column,
        unique_id,
        server_time,
        prev_server_time,
        TIMESTAMP_DIFF(server_time,prev_server_time,minute) AS time_diff
      FROM (
        SELECT
          unique_id,
          server_time,
          LAG(server_time ) OVER (PARTITION BY unique_id ORDER BY server_time ) AS prev_server_time
        FROM
          `table_user_entry`
        ))

Results using OP original sql

还有汇总的最终结果。

WITH connection_data as (SELECT
      date_column,
      unique_id,
      IF(time_diff between 0 and 120, time_diff, 0) as uptime,
      IF(time_diff IS NULL OR time_diff between 120 and 1440, 1, 0) as connection_started

    FROM (
      SELECT
        DATE(server_time) as date_column,
        unique_id,
        server_time,
        prev_server_time,
        TIMESTAMP_DIFF(server_time,prev_server_time,minute) AS time_diff
      FROM (
        SELECT
          unique_id,
          server_time,
          LAG(server_time ) OVER (PARTITION BY unique_id ORDER BY server_time ) AS prev_server_time
        FROM
          `table_user_entry`
        ))
)
SELECT date_column, unique_id, SUM(uptime) as uptime, SUM(connection_started) as connection_times
    FROM connection_data
    GROUP BY date_column, unique_id

Results

这些结果可能与您预期的不同。请注意,如果服务器时间之间存在120分钟的差异,则这些差异严格小于120分钟,因此您可能需要根据完整的工作方案进行调整。