在GoogleBigQuery中具有相同ID的行之间计算time_diff

时间:2018-12-04 03:38:38

标签: sql database statistics google-bigquery

我正在使用BigQuery练习我的SQL技能,并且试图计算每辆自行车的租金之间的时差。基本上,我想为每个具有相同自行车ID的每一对计算不同ID的time_diff。我正在尝试查找每个脚踏车的time_diff分布的中值。现在,我有:

SELECT bikeid,
       DATE_DIFF(date(start_time), date(prev_start_time), day) AS Tempo,
       OrderCount
FROM ( SELECT bikeid,
              start_time, 
              ROW_NUMBER() OVER(PARTITION BY bikeid ORDER BY start_time ASC) OrderCount,
              LAG(start_time) OVER(PARTITION BY bikeid ORDER BY start_time ASC) prev_start_time
       FROM `bigquery-public-data.austin_bikeshare.bikeshare_trips` 
     ) 
ORDER BY bikeid, start_time 

我正在使用公共BigQuery数据集bigquery-public-data.austin_bikeshare.bikeshare_trips,但我的结果很奇怪,因为它没有显示任何自行车ID(我已经期望很多null(0)作为date_diff,因为数据库注册了时间戳记有时一天会租很多次自行车。

    | Linha | bikeid | Tempo | OrderCount |
    |   1   |  null  | null  |     1      |
    |   2   |  null  |  57   |     2      |
    |   3   |  null  |  1    |     3      |

2 个答案:

答案 0 :(得分:1)

Bikeid列中有很多空值。您看到的是空值,因为ASC订单将首先获取空值。 您可以选择的选项很少 •您可以将子句的order by子句更改为DESCid SELECT bikeid,        DATE_DIFF(date(start_time),date(prev_start_time),day)AS节拍,        订单数 从(选择自行车,               开始时间,               ROW_NUMBER()OVER(PARTITION BY bikeid ORDER BY start_time ASC)OrderCount,               滞后(开始时间)OVER(通过脚踏车排序的顺序或通过开始时间ASC的顺序)prev_start_time        来自bigquery-public-data.austin_bikeshare.bikeshare_trips
     )
ORDER BY bikeid desc,start_time •您可以通过添加where子句“ where bikeid不为null”来删除null bikeid。 SELECT bikeid,        DATE_DIFF(date(start_time),date(prev_start_time),day)AS节拍,        订单数 从(选择自行车,               开始时间,               ROW_NUMBER()OVER(PARTITION BY bikeid ORDER BY start_time ASC)OrderCount,               滞后(开始时间)OVER(通过脚踏车排序的顺序或通过开始时间ASC的顺序)prev_start_time        来自bigquery-public-data.austin_bikeshare.bikeshare_trips
       bikeid不为null的地方      )
ORDER BY OrderCount desc,bikeid desc,start_time

答案 1 :(得分:0)

平均数是总差额除以租金数量的乘积。因此,您不需要窗口功能:

SELECT bikeid,
       DATE_DIFF(MAX(DATE(start_time)), MIN(DATE(start_time)), day) / NULLIF(COUNT(*) - 1, 0) as avg_period
FROM `bigquery-public-data.austin_bikeshare.bikeshare_trips` 
GROUP BY bikeid ;

以上内容可解决您的查询并回答您的问题。我不确定它有什么用处,因为自行车每天被多次租用(这是公共自行车租赁计划的重点)。

至少,您可以使用较小的时间单位:

SELECT bikeid,
       TIMESTAMP_DIFF(MAX(start_time), MIN(start_time), second) / NULLIF(COUNT(*) - 1, 0) as avg_period_in_seconds
FROM  `bigquery-public-data.austin_bikeshare.bikeshare_trips` 
GROUP BY bikeid ;