在Google BigQuery中自我加入子查询

时间:2014-11-12 14:38:40

标签: sql google-bigquery

我是一名需要使用NYC 2013 Taxi Trips Dataset located here的特定查询的SQL帮助。

我想分析JFK机场的下车,但是想要建立我的查询,以便我可以包括出租车在机场送人后的下一次接送。

这个查询让我得到了机场某一天的所有旅行:

SELECT * FROM [833682135931:nyctaxi.trip_data] 
WHERE DATE(pickup_datetime) = '2013-05-01'
  AND FLOAT(pickup_latitude) < 40.651381
  AND FLOAT(pickup_latitude) > 40.640668
  AND FLOAT(pickup_longitude) < -73.776283
  AND FLOAT(pickup_longitude) > -73.794694

我希望自己加入数据集,为每一行添加next_pickup_time,next_pickup_lat和next_pickup_lon值。

为此,我假设我需要一个相关的子查询,但不知道从哪里开始构建它因为子查询基于外部查询。

它需要在同一天搜索具有相同奖章的旅行,并且在当前机场下车后提取时间,然后限制1 ...非常感谢任何帮助!

4 个答案:

答案 0 :(得分:1)

考虑使用LAG窗口函数而不是自联接

答案 1 :(得分:1)

我认为N.N.有正确的想法,除了你想要LEAD而不是LAG来获得下一个拾取。例如,此查询将在JFK拾取后生成下一个拾取时间,lat和long。

SELECT
    medallion,
    pickup_datetime,
    pickup_longitude,
    pickup_latitude,
    LEAD(pickup_datetime, 1, "") OVER (PARTITION BY medallion ORDER BY pickup_datetime) AS next_datetime,
    LEAD(pickup_longitude, 1, "0.0") OVER (PARTITION BY medallion ORDER BY pickup_datetime) AS next_longitude,
    LEAD(pickup_latitude, 1, "0.0") OVER (PARTITION BY medallion ORDER BY pickup_datetime) AS next_latitude
FROM [833682135931:nyctaxi.trip_data]
WHERE DATE(pickup_datetime) = '2013-05-01'
  AND FLOAT(pickup_latitude) < 40.651381
  AND FLOAT(pickup_latitude) > 40.640668
  AND FLOAT(pickup_longitude) < -73.776283
  AND FLOAT(pickup_longitude) > -73.794694;

任何时候你都可以避免自我加入,这样做很好。

答案 2 :(得分:1)

这应该会给你下一次拾取的所有下降

SELECT *
FROM
  (SELECT medallion,
          dropoff_datetime,
          dropoff_longitude,
          dropoff_latitude,
          LEAD(pickup_datetime, 1, "") OVER (PARTITION BY medallion
                                             ORDER BY pickup_datetime) AS next_datetime,
          LEAD(pickup_longitude, 1, "0.0") OVER (PARTITION BY medallion
                                                 ORDER BY pickup_datetime) AS next_longitude,
          LEAD(pickup_latitude, 1, "0.0") OVER (PARTITION BY medallion
                                                ORDER BY pickup_datetime) AS next_latitude
   FROM [833682135931:nyctaxi.trip_data]) d
WHERE date(next_datetime)=date(dropoff_datetime)
  AND DATE(dropoff_datetime) = '2013-05-01'
  AND FLOAT(dropoff_latitude) < 40.651381
  AND FLOAT(dropoff_latitude) > 40.640668
  AND FLOAT(dropoff_longitude) < -73.776283
  AND FLOAT(dropoff_longitude) > -73.794694

答案 3 :(得分:0)

这是最终奏效的,改编自Pentium10的答案:

SELECT *
FROM
  (SELECT medallion,
          dropoff_datetime,
          dropoff_longitude,
          dropoff_latitude,
          LEAD(pickup_datetime, 1, "") OVER (PARTITION BY medallion
                                             ORDER BY pickup_datetime) AS next_datetime,
          LEAD(pickup_longitude, 1, "0.0") OVER (PARTITION BY medallion
                                                 ORDER BY pickup_datetime) AS next_longitude,
          LEAD(pickup_latitude, 1, "0.0") OVER (PARTITION BY medallion
                                                ORDER BY pickup_datetime) AS next_latitude
   FROM [833682135931:nyctaxi.trip_data]) d
WHERE date(next_datetime)=date(dropoff_datetime)
  AND DATE(dropoff_datetime) = '2013-05-01'
  AND FLOAT(dropoff_latitude) < 40.651381
  AND FLOAT(dropoff_latitude) > 40.640668
  AND FLOAT(dropoff_longitude) < -73.776283
  AND FLOAT(dropoff_longitude) > -73.794694