Google BigQuery - 基于时间间隔的最活跃时刻

时间:2017-01-12 18:46:42

标签: sql google-bigquery

我们假设我有一个表activities,其中包含字段starttime (TIMESTAMP)stoptime (TIMESTAMP)。我想找一个大多数活动发生的时刻。查询应该首先返回这样的时刻。

我尝试获取所有starttime时间戳,然后为每个时间戳计算当时正在发生的活动数。然后找到最大值:

#standardSQL
SELECT
  time,
  (
    SELECT COUNT(*)
    FROM activities
    WHERE starttime <= time AND time <= stoptime
  ) AS cnt
FROM (
  SELECT DISTINCT starttime AS time
  FROM activities
  ORDER BY time
)
ORDER BY cnt DESC, time ASC
LIMIT 1

不幸的是它说:LEFT OUTER JOIN cannot be used without a condition that is an equality of fields from both sides of the join.

我认为在数据库世界之外的一个适当的算法是让所有starttimesstoptimes以一种它们可以区分的方式将它们放入一个数组中,然后对它进行排序顺序地通过该阵列寻找最大时刻。但是,我不知道如何在SQL中表达这样的算法。

我见过this,但我认为它无论如何都有帮助。

2 个答案:

答案 0 :(得分:2)

我已经取得了与我在问题中描述的算法相近的东西。它的工作速度相当快,但如果你发现任何更好的东西,我会很高兴看到它。

#standardSQL
SELECT time, SUM(add) OVER(ORDER BY time ASC, add DESC) AS cumsum
FROM (
  SELECT starttime AS time, 1 AS add
  FROM activities UNION ALL
  SELECT stoptime AS time, -1 AS add
  FROM activities
)
ORDER BY cumsum DESC

答案 1 :(得分:1)

考虑以下版本
从我的观点来看,它返回更实际的输出 - 即 - 同一级别的连续活动的所有期间(相应的开始和结束)
所以你现在不仅会开始,而是整个时期(开始和结束)活动最多。而不仅仅是一个,而是所有这些

#standardSQL
WITH intervals AS (
  SELECT time AS start_, LEAD(time) OVER(ORDER BY time) AS end_
  FROM (
    SELECT DISTINCT time FROM (
      SELECT starttime AS time FROM activities UNION ALL 
      SELECT stoptime AS time FROM activities ))
),
equals AS (
  SELECT start_, end_, COUNT(1) AS cumsum
  FROM intervals AS i 
  JOIN activities AS a 
  ON  i.start_ >= a.starttime AND i.end_ <= a.stoptime 
  GROUP BY start_, end_
),
grps AS (
  SELECT 
    start_, end_, cumsum, 
    IFNULL(
      CAST(end_ = LEAD(start_) OVER(ORDER BY start_) AND LEAD(cumsum) OVER(ORDER BY start_) = cumsum AS INT64),
      CAST(NOT((start_ = LAG(end_) OVER(ORDER BY start_) AND LAG(cumsum) OVER(ORDER BY start_) = cumsum)) AS INT64)
    ) AS flag
  FROM equals  
)
SELECT MIN(start_) AS start_, MAX(end_) AS end_, cumsum
FROM (
  SELECT start_, end_, cumsum, SUM(flag) OVER(ORDER BY start_) AS grp
  FROM grps
)
GROUP BY cumsum, grp
ORDER BY start_

你可以使用虚拟活动表来玩上面的

WITH activities AS (
  SELECT 1 AS starttime, 3 AS stoptime UNION ALL
  SELECT 1 AS starttime, 4 AS stoptime UNION ALL
  SELECT 4 AS starttime, 5 AS stoptime UNION ALL
  SELECT 7 AS starttime, 8 AS stoptime UNION ALL
  SELECT 7 AS starttime, 10 AS stoptime UNION ALL
  SELECT 8 AS starttime, 12 AS stoptime 
)

WITH activities AS (
  SELECT TIMESTAMP_ADD(CURRENT_TIMESTAMP(), INTERVAL 1 MINUTE) AS starttime, TIMESTAMP_ADD(CURRENT_TIMESTAMP(), INTERVAL 3 MINUTE) AS stoptime UNION ALL
  SELECT TIMESTAMP_ADD(CURRENT_TIMESTAMP(), INTERVAL 1 MINUTE) AS starttime, TIMESTAMP_ADD(CURRENT_TIMESTAMP(), INTERVAL 4 MINUTE) AS stoptime UNION ALL
  SELECT TIMESTAMP_ADD(CURRENT_TIMESTAMP(), INTERVAL 4 MINUTE) AS starttime, TIMESTAMP_ADD(CURRENT_TIMESTAMP(), INTERVAL 5 MINUTE) AS stoptime UNION ALL
  SELECT TIMESTAMP_ADD(CURRENT_TIMESTAMP(), INTERVAL 7 MINUTE) AS starttime, TIMESTAMP_ADD(CURRENT_TIMESTAMP(), INTERVAL 8 MINUTE) AS stoptime UNION ALL
  SELECT TIMESTAMP_ADD(CURRENT_TIMESTAMP(), INTERVAL 7 MINUTE) AS starttime, TIMESTAMP_ADD(CURRENT_TIMESTAMP(), INTERVAL 10 MINUTE) AS stoptime UNION ALL
  SELECT TIMESTAMP_ADD(CURRENT_TIMESTAMP(), INTERVAL 8 MINUTE) AS starttime, TIMESTAMP_ADD(CURRENT_TIMESTAMP(), INTERVAL 12 MINUTE) AS stoptime 
)