使用Gaps和Islands来查找连续的小时/日期 - SQL / BigQuery

时间:2016-03-21 20:45:22

标签: sql google-bigquery gaps-and-islands

我在BigQuery中有一个如下所示的表:

namespace WpfApplication
{
    using System;
    using System.Collections.Generic;
    using System.Diagnostics;
    using System.Linq;
    using System.Windows;

    public class Foo
    {
        ~Foo()
        {
            Debug.Print("Foo finalized");
        }
    }

    public class FoosViewModel // implementing INotifyPropertyChanged here doesn't help
    {
        public FoosViewModel(IEnumerable<Foo> foos)
        {
            Foos = foos;
        }

        public IEnumerable<Foo> Foos { get; }
    }

    public partial class MainWindow
    {
        public MainWindow()
        {
            DataContext = new FoosViewModel(Enumerable.Repeat(new Foo(), 1));
            InitializeComponent();
        }

        private void SetDataContextToNullClicked(object sender, RoutedEventArgs e)
        {
            DataContext = null;
        }

        private void SetDataContextToEmptyObjectClicked(object sender, RoutedEventArgs e)
        {
            DataContext = new FoosViewModel(Enumerable.Empty<Foo>());
        }

        private void CollectGarbageClicked(object sender, RoutedEventArgs e)
        {
            GC.Collect();
            GC.WaitForPendingFinalizers();
        }
    }
}

我想为BigQuery编写一个SQL查询,它允许我计算至少进行一次调用的连续小时数(按caller_number排序),以及至少连续10小时调用的连续天数(按caller_number排序。我一直在研究关于差距和岛屿的现有资源,但似乎无法弄清楚如何将其应用于连续的日期和时间。

1 个答案:

答案 0 :(得分:2)

以下是连续工作时间的工作示例
步骤是
1.从call_time“提取”小时

HOUR(TIMESTAMP(CURRENT_DATE() + ' ' + call_time))

2.查找前一小时

LAG([hour]) OVER(PARTITION BY Caller_Number, [month], [day] ORDER BY [hour])

3.计算连续小时组的开始 - 1 - 开始,0 - 组继续

IFNULL(INTEGER([hour] - prev_hour > 1), 1)

4.为每个组分配组号

SUM(seq) OVER(PARTITION BY Caller_Number, [month], [day] ORDER BY [hour])

5.最后 - 按组号分组并计算通话时间

希望这可以让你在连续几天的结果上连续几天实现类似逻辑的良好开端

SELECT Caller_Number, [month], [day], seq_group, 
  EXACT_COUNT_DISTINCT([hour]) AS hours_count, COUNT(1) AS calls_count 
FROM (
  SELECT Caller_Number, [month], [day], [hour],  
    SUM(seq) OVER(PARTITION BY Caller_Number, [month], [day] 
                  ORDER BY [hour]) AS seq_group
  FROM (
    SELECT Caller_Number, [month], [day], [hour], 
      IFNULL(INTEGER([hour] - prev_hour > 1), 1) AS seq
    FROM (
      SELECT Caller_Number, [month], [day], [hour], 
        LAG([hour]) OVER(PARTITION BY Caller_Number, [month], [day] 
                         ORDER BY [hour]) AS prev_hour
      FROM (
        SELECT Caller_Number, [month], [day], 
          HOUR(TIMESTAMP(CURRENT_DATE() + ' ' + call_time)) AS [hour] 
        FROM YourTable
      )
    )
  )
)
GROUP BY Caller_Number, [month], [day], seq_group