我是一名需要使用NYC 2013 Taxi Trips Dataset located here的特定查询的SQL帮助。
我想分析JFK机场的下车,但是想要建立我的查询,以便我可以包括出租车在机场送人后的下一次接送。
这个查询让我得到了机场某一天的所有旅行:
SELECT * FROM [833682135931:nyctaxi.trip_data]
WHERE DATE(pickup_datetime) = '2013-05-01'
AND FLOAT(pickup_latitude) < 40.651381
AND FLOAT(pickup_latitude) > 40.640668
AND FLOAT(pickup_longitude) < -73.776283
AND FLOAT(pickup_longitude) > -73.794694
我希望自己加入数据集,为每一行添加next_pickup_time,next_pickup_lat和next_pickup_lon值。
为此,我假设我需要一个相关的子查询,但不知道从哪里开始构建它因为子查询基于外部查询。
它需要在同一天搜索具有相同奖章的旅行,并且在当前机场下车后提取时间,然后限制1 ...非常感谢任何帮助!
答案 0 :(得分:1)
考虑使用LAG窗口函数而不是自联接
答案 1 :(得分:1)
我认为N.N.有正确的想法,除了你想要LEAD而不是LAG来获得下一个拾取。例如,此查询将在JFK拾取后生成下一个拾取时间,lat和long。
SELECT
medallion,
pickup_datetime,
pickup_longitude,
pickup_latitude,
LEAD(pickup_datetime, 1, "") OVER (PARTITION BY medallion ORDER BY pickup_datetime) AS next_datetime,
LEAD(pickup_longitude, 1, "0.0") OVER (PARTITION BY medallion ORDER BY pickup_datetime) AS next_longitude,
LEAD(pickup_latitude, 1, "0.0") OVER (PARTITION BY medallion ORDER BY pickup_datetime) AS next_latitude
FROM [833682135931:nyctaxi.trip_data]
WHERE DATE(pickup_datetime) = '2013-05-01'
AND FLOAT(pickup_latitude) < 40.651381
AND FLOAT(pickup_latitude) > 40.640668
AND FLOAT(pickup_longitude) < -73.776283
AND FLOAT(pickup_longitude) > -73.794694;
任何时候你都可以避免自我加入,这样做很好。
答案 2 :(得分:1)
这应该会给你下一次拾取的所有下降
SELECT *
FROM
(SELECT medallion,
dropoff_datetime,
dropoff_longitude,
dropoff_latitude,
LEAD(pickup_datetime, 1, "") OVER (PARTITION BY medallion
ORDER BY pickup_datetime) AS next_datetime,
LEAD(pickup_longitude, 1, "0.0") OVER (PARTITION BY medallion
ORDER BY pickup_datetime) AS next_longitude,
LEAD(pickup_latitude, 1, "0.0") OVER (PARTITION BY medallion
ORDER BY pickup_datetime) AS next_latitude
FROM [833682135931:nyctaxi.trip_data]) d
WHERE date(next_datetime)=date(dropoff_datetime)
AND DATE(dropoff_datetime) = '2013-05-01'
AND FLOAT(dropoff_latitude) < 40.651381
AND FLOAT(dropoff_latitude) > 40.640668
AND FLOAT(dropoff_longitude) < -73.776283
AND FLOAT(dropoff_longitude) > -73.794694
答案 3 :(得分:0)
这是最终奏效的,改编自Pentium10的答案:
SELECT *
FROM
(SELECT medallion,
dropoff_datetime,
dropoff_longitude,
dropoff_latitude,
LEAD(pickup_datetime, 1, "") OVER (PARTITION BY medallion
ORDER BY pickup_datetime) AS next_datetime,
LEAD(pickup_longitude, 1, "0.0") OVER (PARTITION BY medallion
ORDER BY pickup_datetime) AS next_longitude,
LEAD(pickup_latitude, 1, "0.0") OVER (PARTITION BY medallion
ORDER BY pickup_datetime) AS next_latitude
FROM [833682135931:nyctaxi.trip_data]) d
WHERE date(next_datetime)=date(dropoff_datetime)
AND DATE(dropoff_datetime) = '2013-05-01'
AND FLOAT(dropoff_latitude) < 40.651381
AND FLOAT(dropoff_latitude) > 40.640668
AND FLOAT(dropoff_longitude) < -73.776283
AND FLOAT(dropoff_longitude) > -73.794694