如何计算在原始BigQuery标准SQL中发生特定数据的次数

时间:2018-08-29 16:16:22

标签: sql google-bigquery

说我有这张桌子,

| eta   | arrived  | time_diff |  
+-------+----------+-----------+
| 06:47 |    06:47 |    0      |
| 08:30 |    08:40 |    10     | 
| 10:30 |    10:40 |    10     |
| 10:30 |    10:31 |    1      | 
+-------+----------+-----------+
and i got the time_diff by TIME_DIFF(arrived , eta , MINUTE) as time_diff

我想做的是能够计算出我有0、10 ...个。 理想情况下,上表将产生1 0、2 10和11。Offcorse我不预先知道time_diff结果只是想计算结果发生的次数,说我可能有2,3,5 ... 如何在BigQuery标准SQL中完成此操作?

2 个答案:

答案 0 :(得分:1)

您应该使用group by子句

Select time_diff , Count(*)
From [table]
Group by time_diff

答案 1 :(得分:1)

以下是用于BigQuery标准SQL

从实际的角度来看,我建议按以下示例对箱进行分组:0-9、10-19、20-29,依此类推

#standardSQL
WITH `project.dataset.table` AS (
  SELECT '06:47' eta, '06:47' arrived  UNION ALL
  SELECT '08:30', '08:40' UNION ALL
  SELECT '10:30', '10:40' UNION ALL
  SELECT '10:30', '10:31'
)
SELECT FORMAT('%i - %i', bin, bin + 9) bin, cnt 
FROM (
  SELECT 
    10 * DIV(TIME_DIFF(PARSE_TIME('%R', arrived) , PARSE_TIME('%R', eta) , MINUTE), 10) bin,
    COUNT(1) cnt
  FROM `project.dataset.table`
  GROUP BY bin
)
ORDER BY bin   

有结果

Row     bin         cnt  
1       0 - 9       2    
2       10 - 19     2      

如果您需要每个time_diff的精确分布,请在下面使用

#standardSQL
WITH `project.dataset.table` AS (
  SELECT '06:47' eta, '06:47' arrived  UNION ALL
  SELECT '08:30', '08:40' UNION ALL
  SELECT '10:30', '10:40' UNION ALL
  SELECT '10:30', '10:31'
)
SELECT 
  TIME_DIFF(PARSE_TIME('%R', arrived) , PARSE_TIME('%R', eta) , MINUTE) diff,
  COUNT(1) cnt
FROM `project.dataset.table`
GROUP BY diff
ORDER BY diff  

结果为

Row     diff        cnt  
1       0           1    
2       1           1    
3       10          2